no code implementations • 1 Jan 2021 • Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.
1 code implementation • 20 Feb 2020 • Rohan Anil, Vineet Gupta, Tomer Koren, Kevin Regan, Yoram Singer
Optimization in machine learning, both theoretical and applied, is presently dominated by first-order gradient methods such as stochastic gradient descent.
no code implementations • 5 Feb 2020 • Inbal Lav, Shai Avidan, Yoram Singer, Yacov Hel-Or
We show that the proposed approximation is superior to the commonly used spectral methods with respect to both accuracy and complexity.
1 code implementation • NeurIPS 2019 • Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.
no code implementations • 8 Jun 2019 • Michael Iuzzolino, Yoram Singer, Michael C. Mozer
In human perception and cognition, a fundamental operation that brains perform is interpretation: constructing coherent neural states from noisy, incomplete, and intrinsically ambiguous evidence.
no code implementations • ICLR 2020 • Chiyuan Zhang, Samy Bengio, Moritz Hardt, Michael C. Mozer, Yoram Singer
We study the interplay between memorization and generalization of overparameterized networks in the extreme case of a single training example and an identity-mapping task.
2 code implementations • ICML Workshop Deep_Phenomen 2019 • Chiyuan Zhang, Samy Bengio, Yoram Singer
Morally, layers of large deep neural networks can be categorized as either "robust" or "critical".
no code implementations • 5 Feb 2019 • Udaya Ghai, Elad Hazan, Yoram Singer
The hypentropy has a natural spectral counterpart which we use to derive a family of matrix-based updates that bridge gradient methods and the multiplicative method for matrices.
3 code implementations • 30 Jan 2019 • Rohan Anil, Vineet Gupta, Tomer Koren, Yoram Singer
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.
Ranked #31 on Machine Translation on WMT2014 English-French
no code implementations • ICML 2018 • Yuanzhi Li, Yoram Singer
Every regression parameter in the Lasso changes linearly as a function of the regularization value.
no code implementations • 8 Jun 2018 • Yuanzhi Li, Yoram Singer
Every regression parameter in the Lasso changes linearly as a function of the regularization value.
3 code implementations • ICML 2018 • Vineet Gupta, Tomer Koren, Yoram Singer
Preconditioned gradient methods are among the most general and powerful tools in optimization.
no code implementations • ICLR 2018 • Nishal P Shah, Sasidhar Madugula, EJ Chichilnisky, Yoram Singer, Jonathon Shlens
Retinal prostheses for treating incurable blindness are designed to electrically stimulate surviving retinal neurons, causing them to send artificial visual signals to the brain.
no code implementations • 20 Jun 2017 • Vineet Gupta, Tomer Koren, Yoram Singer
We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning.
no code implementations • 22 Mar 2017 • Amit Daniely, Roy Frostig, Vineet Gupta, Yoram Singer
We describe and analyze a simple random feature scheme (RFS) from prescribed compositional kernels.
no code implementations • 19 Apr 2016 • Amit Daniely, Nevena Lazic, Yoram Singer, Kunal Talwar
In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree.
no code implementations • NeurIPS 2016 • Amit Daniely, Roy Frostig, Yoram Singer
We develop a general duality between neural networks and compositional kernels, striving towards a better understanding of deep learning.
no code implementations • 3 Sep 2015 • Moritz Hardt, Benjamin Recht, Yoram Singer
In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting.
2 code implementations • 19 Dec 2013 • Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean
In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage.
Ranked #8 on Multi-label zero-shot learning on Open Images V4
no code implementations • 19 Dec 2013 • Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer
Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy.
no code implementations • 7 Nov 2013 • Moshe Dubiner, Matan Gavish, Yoram Singer
We show existence and a geometric description of the relaxation path.
no code implementations • NeurIPS 2009 • Yoram Singer, John C. Duchi
We derive concrete and very simple algorithms for minimization of loss functions with $\ell_1$, $\ell_2$, $\ell_2^2$, and $\ell_\infty$ regularization.
no code implementations • NeurIPS 2009 • Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Bag-of-words document representations are often used in text, image and video processing.