1 code implementation • 3 Oct 2023 • Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand
We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.
1 code implementation • NeurIPS 2023 • Amir Joudaki, Hadi Daneshmand, Francis Bach
In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs.
no code implementations • 9 Feb 2023 • Hadi Daneshmand, Jason D. Lee, Chi Jin
Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures.
no code implementations • 25 May 2022 • Amir Joudaki, Hadi Daneshmand, Francis Bach
Mean field theory is widely used in the theoretical studies of neural networks.
1 code implementation • 16 Apr 2022 • Hadi Daneshmand, Francis Bach
Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.
no code implementations • NeurIPS 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand
The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.
1 code implementation • NeurIPS 2021 • Hadi Daneshmand, Amir Joudaki, Francis Bach
This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.
no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith
Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.
no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
no code implementations • 31 Oct 2019 • Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann
We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.
no code implementations • 27 May 2018 • Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann
Normalization techniques such as Batch Normalization have been applied successfully for training deep neural networks.
1 code implementation • 15 May 2018 • Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.
no code implementations • ICML 2018 • Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann
We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.
no code implementations • 13 Jun 2017 • Hadi Daneshmand, Hamed Hassani, Thomas Hofmann
Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning.
no code implementations • 20 May 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.
no code implementations • 9 Mar 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.
no code implementations • 12 May 2014 • Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, Bernhard Schoelkopf
Can we recover the hidden network structures from these observed cascades?