Search Results for author: Mor Shpigel Nacson

Found 9 papers, 2 papers with code

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

no code implementations9 Feb 2024 Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry

We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

no code implementations30 Jun 2023 Mor Shpigel Nacson, Rotem Mulayoff, Greg Ongie, Tomer Michaeli, Daniel Soudry

Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

no code implementations19 Feb 2021 Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.

Inductive Bias

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

no code implementations17 May 2019 Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

no code implementations5 Jun 2018 Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry

We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.

Convergence of Gradient Descent on Separable Data

no code implementations5 Mar 2018 Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry

We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.

The Implicit Bias of Gradient Descent on Separable Data

2 code implementations ICLR 2018 Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.

Cannot find the paper you are looking for? You can Submit a new open access paper.