no code implementations • 4 Sep 2023 • Etay Livne, Gal Kaplun, Eran Malach, Shai Shalev-Schwatz
However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient.
no code implementations • 14 Jun 2023 • Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak
The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise").
1 code implementation • 13 Feb 2023 • Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach
Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.
no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz
We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.
1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran
In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.
1 code implementation • 11 Mar 2021 • Nishanth Dikkala, Gal Kaplun, Rina Panigrahy
We provide theoretical and empirical evidence that neural representations can be viewed as LSH-like functions that map each input to an embedding that is a function of solely the informative $\gamma$ and invariant to $\theta$, effectively recovering the manifold identifier $\gamma$.
2 code implementations • ICLR 2021 • Yamini Bansal, Gal Kaplun, Boaz Barak
We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e. g., linear) classifier $g$ to the labels.
no code implementations • 21 Feb 2020 • Sharon Qian, Dimitris Kalimeris, Gal Kaplun, Yaron Singer
Despite the vast success of Deep Neural Networks in numerous application domains, it has been shown that such models are not robust i. e., they are vulnerable to small adversarial perturbations of the input.
3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever
We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
1 code implementation • NeurIPS 2019 • Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak
We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.
no code implementations • 9 Mar 2019 • Dimitris Kalimeris, Gal Kaplun, Yaron Singer
A recent surging research direction in influence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter.