Search Results for author: Alexandru Meterez

Found 2 papers, 1 papers with code

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

no code implementations • 27 Feb 2024 • Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i. e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time.

Paper
Add Code

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation • 3 Oct 2023 • Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.