1 code implementation • 22 Dec 2023 • Eszter Székely, Lorenzo Bardone, Federica Gerace, Sebastian Goldt
Our results show that neural networks extract information from higher-order correlations in the spiked cumulant model efficiently, and reveal a large gap in the amount of data required by neural networks and random features to learn from higher-order cumulants.
no code implementations • 14 Apr 2023 • Riccardo Rende, Federica Gerace, Alessandro Laio, Sebastian Goldt
In MLM, a word is randomly masked in an input sequence, and the network is trained to predict the missing word.
no code implementations • 2 Mar 2023 • Federica Gerace, Diego Doimo, Stefano Sarao Mannelli, Luca Saglietti, Alessandro Laio
The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a data-poor target task.
no code implementations • 31 May 2022 • Stefano Sarao Mannelli, Federica Gerace, Negar Rostamzadeh, Luca Saglietti
Then, we consider a novel mitigation strategy based on a matched inference approach, consisting in the introduction of coupled learning models.
2 code implementations • 26 May 2022 • Federica Gerace, Florent Krzakala, Bruno Loureiro, Ludovic Stephan, Lenka Zdeborová
We argue that there is a large universality class of high-dimensional input data for which we obtain the same minimum training loss as for Gaussian data with corresponding data covariance.
no code implementations • 9 Jun 2021 • Federica Gerace, Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe, Lenka Zdeborová
Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task.
no code implementations • ICML 2020 • Federica Gerace, Bruno Loureiro, Florent Krzakala, Marc Mézard, Lenka Zdeborová
In particular, we show how to obtain analytically the so-called double descent behaviour for logistic regression with a peak at the interpolation threshold, we illustrate the superiority of orthogonal against random Gaussian projections in learning with random features, and discuss the role played by correlations in the data generated by the hidden manifold model.
no code implementations • ICLR 2020 • George Stamatescu, Federica Gerace, Carlo Lucibello, Ian Fuss, Langford B. White
Moreover, we predict theoretically and confirm numerically, that common weight initialisation schemes used in standard continuous networks, when applied to the mean values of the stochastic binary weights, yield poor training performance.
no code implementations • 26 Oct 2017 • Carlo Baldassi, Federica Gerace, Hilbert J. Kappen, Carlo Lucibello, Luca Saglietti, Enzo Tartaglione, Riccardo Zecchina
Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes.
no code implementations • 12 Feb 2016 • Carlo Baldassi, Federica Gerace, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina
Learning in neural networks poses peculiar challenges when using discretized rather then continuous synaptic states.