Lossless Compression and Generalization in Overparameterized Models: The Case of Boosting

ICLR Workshop Neural_Compression 2021 · Nikolaos Nikolaou ·

Successful learning algorithms like DNNs, kernel methods or ensemble learning methods, have been known to produce models that exhibit good generalization despite being drawn from overparameterized model families. This observation has put in question the convex relationship between model complexity and generalization. We instead propose rethinking the relevant notion of model complexity for the purposes of assessing the complexity of models trained on a given dataset. Borrowing from information theory, we identify the optimal model one can train on a given dataset as one achieving its lossless maximal compression. In the noiseless dataset setting, it can be shown that such a model coincides with an average margin maximizer of the training data. Experimental results on gradient boosting confirm our observations and show that the minimal generalization error is attained in expectation by models achieving lossless maximal compression of the training data.

PDF Abstract