no code implementations • 10 Feb 2023 • Ravi Srinivasan, Francesca Mignacco, Martino Sorbaro, Maria Refinetti, Avi Cooper, Gabriel Kreiman, Giorgia Dellaferrera
"Forward-only" algorithms, which train neural networks while avoiding a backward pass, have recently gained attention as a way of solving the biologically unrealistic aspects of backpropagation.
1 code implementation • 12 Oct 2022 • Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova
We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e. g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization.
1 code implementation • 22 Mar 2022 • Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová
For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality.
no code implementations • 20 Dec 2021 • Francesca Mignacco, Pierfrancesco Urbani
In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluctuation-dissipation theorem, computed from dynamical mean-field theory.
no code implementations • 8 Mar 2021 • Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborová
In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity.
no code implementations • NeurIPS 2020 • Francesca Mignacco, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová
We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow.
no code implementations • ICML 2020 • Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová
We also illustrate the interpolation peak at low regularization, and analyze the role of the respective sizes of the two clusters.