Lossless Compression and Generalization in Overparameterized Models: The Case of Boosting

Successful learning algorithms like DNNs, kernel methods or ensemble learning methods, have been known to produce models that exhibit good generalization despite being drawn from overparameterized model families. This observation has put in question the convex relationship between model complexity and generalization. We instead propose rethinking the relevant notion of model complexity for the purposes of assessing the complexity of models trained on a given dataset. Borrowing from information theory, we identify the optimal model one can train on a given dataset as one achieving its lossless maximal compression. In the noiseless dataset setting, it can be shown that such a model coincides with an average margin maximizer of the training data. Experimental results on gradient boosting confirm our observations and show that the minimal generalization error is attained in expectation by models achieving lossless maximal compression of the training data.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here