Search Results for author: Hadi Daneshmand

Found 18 papers, 5 papers with code

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation • 3 Oct 2023 • Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

Paper
Code

On the impact of activation and normalization in obtaining isometric embeddings at initialization

1 code implementation • NeurIPS 2023 • Amir Joudaki, Hadi Daneshmand, Francis Bach

In this paper, we explore the structure of the penultimate Gram matrix in deep neural networks, which contains the pairwise inner products of outputs corresponding to a batch of inputs.

Paper
Code

Efficient displacement convex optimization with particle gradient descent

no code implementations • 9 Feb 2023 • Hadi Daneshmand, Jason D. Lee, Chi Jin

Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures.

Paper
Add Code

On Bridging the Gap between Mean Field and Finite Width in Deep Random Neural Networks with Batch Normalization

no code implementations • 25 May 2022 • Amir Joudaki, Hadi Daneshmand, Francis Bach

Mean field theory is widely used in the theoretical studies of neural networks.

Paper
Add Code

Polynomial-time Sparse Measure Recovery: From Mean Field Theory to Algorithm Design

1 code implementation • 16 Apr 2022 • Hadi Daneshmand, Francis Bach

Mean field theory has provided theoretical insights into various algorithms by letting the problem size tend to infinity.

Super-Resolution Tensor Decomposition

Paper
Code

Rethinking the Variational Interpretation of Accelerated Optimization Methods

no code implementations • NeurIPS 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand

The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.

Paper
Add Code

Batch Normalization Orthogonalizes Representations in Deep Random Networks

1 code implementation • NeurIPS 2021 • Hadi Daneshmand, Amir Joudaki, Francis Bach

This paper underlines a subtle property of batch-normalization (BN): Successive batch normalizations with random linear transformations make hidden representations increasingly orthogonal across layers of a deep neural network.

Paper
Code

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.

Numerical Integration

Paper
Add Code

Batch normalization provably avoids ranks collapse for randomly initialised deep networks

no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Paper
Add Code

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks

no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi

Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.

Paper
Add Code

Mixing of Stochastic Accelerated Gradient Descent

no code implementations • 31 Oct 2019 • Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann

We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.

Stochastic Optimization

Paper
Add Code

Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization

no code implementations • 27 May 2018 • Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann

Normalization techniques such as Batch Normalization have been applied successfully for training deep neural networks.

Paper
Add Code

Local Saddle Point Optimization: A Curvature Exploitation Approach

1 code implementation • 15 May 2018 • Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.

Paper
Code

Escaping Saddles with Stochastic Gradients

no code implementations • ICML 2018 • Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.

Paper
Add Code

Accelerated Dual Learning by Homotopic Initialization

no code implementations • 13 Jun 2017 • Hadi Daneshmand, Hamed Hassani, Thomas Hofmann

Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning.

Paper
Add Code

DynaNewton - Accelerating Newton's Method for Machine Learning

no code implementations • 20 May 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.

BIG-bench Machine Learning

Paper
Add Code

Starting Small -- Learning with Adaptive Sample Sizes

no code implementations • 9 Mar 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann

For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.

BIG-bench Machine Learning

Paper
Add Code

Estimating Diffusion Network Structures: Recovery Conditions, Sample Complexity & Soft-thresholding Algorithm

no code implementations • 12 May 2014 • Hadi Daneshmand, Manuel Gomez-Rodriguez, Le Song, Bernhard Schoelkopf

Can we recover the hidden network structures from these observed cascades?

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.