Search Results for author: Mor Shpigel Nacson

Found 9 papers, 2 papers with code

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

no code implementations • 9 Feb 2024 • Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry

We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.

Paper
Add Code

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

no code implementations • 30 Jun 2023 • Mor Shpigel Nacson, Rotem Mulayoff, Greg Ongie, Tomer Michaeli, Daniel Soudry

Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.

Paper
Add Code

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond

no code implementations • 22 May 2023 • Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon

Using this result, we characterize settings where GD provably converges to the EoS in scalar networks.

Paper
Add Code

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

Paper
Add Code

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

1 code implementation • ICLR 2020 • Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry

However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm.

Paper
Code

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

Paper
Add Code

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

no code implementations • 5 Jun 2018 • Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.

Paper
Add Code

Convergence of Gradient Descent on Separable Data

no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

Paper
Add Code

The Implicit Bias of Gradient Descent on Separable Data

2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.