Search Results for author: Benoit Dherin

Found 10 papers, 2 papers with code

Corridor Geometry in Gradient-Based Optimization

no code implementations • 13 Feb 2024 • Benoit Dherin, Mihaela Rosca

We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines.

Paper
Add Code

Implicit biases in multitask and continual learning from a backward error analysis perspective

no code implementations • 1 Nov 2023 • Benoit Dherin

Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent.

Continual Learning

Paper
Add Code

Morse Neural Networks for Uncertainty Quantification

no code implementations • 2 Jul 2023 • Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan

We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points.

Anomaly Detection One-class classifier +1

Paper
Add Code

Deep Fusion: Efficient Network Training via Pre-trained Initializations

no code implementations • 20 Jun 2023 • Hanna Mazzawi, Xavi Gonzalvo, Michael Wunder, Sammy Jerome, Benoit Dherin

Finally, we validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that with carefully optimized training dynamics, it significantly reduces both training time and resource consumption.

Paper
Add Code

On a continuous time model of gradient descent dynamics and instability in deep learning

2 code implementations • 3 Feb 2023 • Mihaela Rosca, Yan Wu, Chongli Qin, Benoit Dherin

The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization.

Paper
Code

Why neural networks find simple solutions: the many regularizers of geometric complexity

no code implementations • 27 Sep 2022 • Benoit Dherin, Michael Munn, Mihaela Rosca, David G. T. Barrett

Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

Paper
Add Code

The Geometric Occam's Razor Implicit in Deep Learning

no code implementations • 30 Nov 2021 • Benoit Dherin, Michael Munn, David G. T. Barrett

We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the geometric model complexity.

Paper
Add Code

Discretization Drift in Two-Player Games

3 code implementations • 28 May 2021 • Mihaela Rosca, Yan Wu, Benoit Dherin, David G. T. Barrett

Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand.

Vocal Bursts Valence Prediction

12,821

Paper
Code

On the Origin of Implicit Regularization in Stochastic Gradient Descent

no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De

To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.

Paper
Add Code

Implicit Gradient Regularization

no code implementations • ICLR 2021 • David G. T. Barrett, Benoit Dherin

We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.