no code implementations • 9 Feb 2024 • Gon Buzaglo, Itamar Harel, Mor Shpigel Nacson, Alon Brutzkus, Nathan Srebro, Daniel Soudry
We prove that such a random NN interpolator typically generalizes well if there exists an underlying narrow ``teacher NN" that agrees with the labels.
no code implementations • 30 Jun 2023 • Mor Shpigel Nacson, Rotem Mulayoff, Greg Ongie, Tomer Michaeli, Daniel Soudry
Finally, we prove that if a function is sufficiently smooth (in a Sobolev sense) then it can be approximated arbitrarily well using shallow ReLU networks that correspond to stable solutions of gradient descent.
no code implementations • 22 May 2023 • Itai Kreisler, Mor Shpigel Nacson, Daniel Soudry, Yair Carmon
Using this result, we characterize settings where GD provably converges to the EoS in scalar networks.
no code implementations • 19 Feb 2021 • Shahar Azulay, Edward Moroshko, Mor Shpigel Nacson, Blake Woodworth, Nathan Srebro, Amir Globerson, Daniel Soudry
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to.
1 code implementation • ICLR 2020 • Niv Giladi, Mor Shpigel Nacson, Elad Hoffer, Daniel Soudry
However, asynchronous training has its pitfalls, mainly a degradation in generalization, even after convergence of the algorithm.
no code implementations • 17 May 2019 • Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro, Daniel Soudry
With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models.
no code implementations • 5 Jun 2018 • Mor Shpigel Nacson, Nathan Srebro, Daniel Soudry
We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate - in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data.
no code implementations • 5 Mar 2018 • Mor Shpigel Nacson, Jason D. Lee, Suriya Gunasekar, Pedro H. P. Savarese, Nathan Srebro, Daniel Soudry
We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails.
2 code implementations • ICLR 2018 • Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, Nathan Srebro
We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets.