1 code implementation • 6 Oct 2023 • Maksym Andriushchenko, Francesco D'Angelo, Aditya Varre, Nicolas Flammarion
In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory.
1 code implementation • 11 Oct 2022 • Maksym Andriushchenko, Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion
We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics orthogonal to the bouncing directions that biases it implicitly toward sparse predictors.
no code implementations • 3 Mar 2022 • Aditya Varre, Nicolas Flammarion
We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting.
no code implementations • NeurIPS 2021 • Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion
Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup.