Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization

COLING 2016 · Hajime Senuma, Akiko Aizawa ·

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there are $m$ features in total but only $n = o(\sqrt{m})$ features are required to distinguish examples, with $\Omega(\log m)$ training examples and reasonable settings, it is possible to obtain a good model in a \textit{succinct} representation using $n \log_2 \frac{m}{n} + o(m)$ bits, by using a pipeline of existing compression methods: L1-regularized logistic regression, feature hashing, Elias{--}Fano indices, and randomized quantization. An experiment shows that a noun phrase chunking task for which an existing library requires 27 megabytes can be compressed to less than 13 \textit{kilo}bytes without notable loss of accuracy.