no code implementations • 13 Feb 2024 • Benoit Dherin, Mihaela Rosca
We characterize regions of a loss surface as corridors when the continuous curves of steepest descent -- the solutions of the gradient flow -- become straight lines.
no code implementations • 1 Nov 2023 • Benoit Dherin
Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent.
no code implementations • 2 Jul 2023 • Benoit Dherin, Huiyi Hu, Jie Ren, Michael W. Dusenberry, Balaji Lakshminarayanan
We introduce a new deep generative model useful for uncertainty quantification: the Morse neural network, which generalizes the unnormalized Gaussian densities to have modes of high-dimensional submanifolds instead of just discrete points.
no code implementations • 20 Jun 2023 • Hanna Mazzawi, Xavi Gonzalvo, Michael Wunder, Sammy Jerome, Benoit Dherin
Finally, we validate our theoretical framework, which guides the optimal use of Deep Fusion, showing that with carefully optimized training dynamics, it significantly reduces both training time and resource consumption.
2 code implementations • 3 Feb 2023 • Mihaela Rosca, Yan Wu, Chongli Qin, Benoit Dherin
The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization.
no code implementations • 27 Sep 2022 • Benoit Dherin, Michael Munn, Mihaela Rosca, David G. T. Barrett
Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
no code implementations • 30 Nov 2021 • Benoit Dherin, Michael Munn, David G. T. Barrett
We argue that over-parameterized neural networks trained with stochastic gradient descent are subject to a Geometric Occam's Razor; that is, these networks are implicitly regularized by the geometric model complexity.
3 code implementations • 28 May 2021 • Mihaela Rosca, Yan Wu, Benoit Dherin, David G. T. Barrett
Gradient-based methods for two-player games produce rich dynamics that can solve challenging problems, yet can be difficult to stabilize and understand.
no code implementations • ICLR 2021 • Samuel L. Smith, Benoit Dherin, David G. T. Barrett, Soham De
To interpret this phenomenon we prove that for SGD with random shuffling, the mean SGD iterate also stays close to the path of gradient flow if the learning rate is small and finite, but on a modified loss.
no code implementations • ICLR 2021 • David G. T. Barrett, Benoit Dherin
We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization.