no code implementations • 13 Apr 2023 • Chris Mingard, Henry Rees, Guillermo Valle-Pérez, Ard A. Louis
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data.
no code implementations • 25 Jun 2021 • Guillermo Valle-Pérez, Gustav Eje Henter, Jonas Beskow, André Holzapfel, Pierre-Yves Oudeyer, Simon Alexanderson
First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder.
no code implementations • 14 Feb 2021 • Ouns El Harzli, Bernardo Cuenca Grau, Guillermo Valle-Pérez, Ard A. Louis
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime.
no code implementations • 7 Dec 2020 • Guillermo Valle-Pérez, Ard A. Louis
Here we introduce desiderata for techniques that predict generalization errors for deep learning models in supervised learning.
no code implementations • 26 Jun 2020 • Chris Mingard, Guillermo Valle-Pérez, Joar Skalse, Ard A. Louis
Our main findings are that $P_{SGD}(f\mid S)$ correlates remarkably well with $P_B(f\mid S)$ and that $P_B(f\mid S)$ is strongly biased towards low-error and low complexity functions.
no code implementations • 25 Sep 2019 • Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis
Understanding the inductive bias of neural networks is critical to explaining their ability to generalise.
no code implementations • 28 May 2019 • Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis
Deep neural networks can be viewed as a mapping from the space of parameters (the weights) to the space of functions (how inputs get transformed to outputs by the network).
no code implementations • ICLR 2019 • Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis
We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST.