Search Results for author: Gal Kaplun

Found 11 papers, 6 papers with code

Corgi^2: A Hybrid Offline-Online Approach To Storage-Aware Data Shuffling For SGD

no code implementations • 4 Sep 2023 • Etay Livne, Gal Kaplun, Eran Malach, Shai Shalev-Schwatz

However, for large datasets stored in the cloud, random access to individual examples is often costly and inefficient.

Paper
Add Code

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

no code implementations • 14 Jun 2023 • Nikhil Vyas, Depen Morwani, Rosie Zhao, Gal Kaplun, Sham Kakade, Boaz Barak

The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by high learning rate or small batch size ("SGD noise").

Paper
Add Code

Less is More: Selective Layer Finetuning with SubTuning

1 code implementation • 13 Feb 2023 • Gal Kaplun, Andrey Gurevich, Tal Swisa, Mazor David, Shai Shalev-Shwartz, Eran Malach

Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance.

Multi-Task Learning

Paper
Code

Knowledge Distillation: Bad Models Can Be Good Role Models

no code implementations • 28 Mar 2022 • Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.

Knowledge Distillation Learning Theory

Paper
Add Code

Deconstructing Distributions: A Pointwise Framework of Learning

1 code implementation • 20 Feb 2022 • Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.

Paper
Code

For Manifold Learning, Deep Neural Networks can be Locality Sensitive Hash Functions

1 code implementation • 11 Mar 2021 • Nishanth Dikkala, Gal Kaplun, Rina Panigrahy

We provide theoretical and empirical evidence that neural representations can be viewed as LSH-like functions that map each input to an embedding that is a function of solely the informative $\gamma$ and invariant to $\theta$, effectively recovering the manifold identifier $\gamma$.

One-Shot Learning

Paper
Code

For self-supervised learning, Rationality implies generalization, provably

2 code implementations • ICLR 2021 • Yamini Bansal, Gal Kaplun, Boaz Barak

We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation $r$ of the training data, and then fitting a simple (e. g., linear) classifier $g$ to the labels.

Representation Learning Self-Supervised Learning

Paper
Code

Robustness from Simple Classifiers

no code implementations • 21 Feb 2020 • Sharon Qian, Dimitris Kalimeris, Gal Kaplun, Yaron Singer

Despite the vast success of Deep Neural Networks in numerous application domains, it has been shown that such models are not robust i. e., they are vulnerable to small adversarial perturbations of the input.

Paper
Add Code

Deep Double Descent: Where Bigger Models and More Data Hurt

3 code implementations • ICLR 2020 • Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

Paper
Code

SGD on Neural Networks Learns Functions of Increasing Complexity

1 code implementation • NeurIPS 2019 • Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.

Paper
Code

Robust Influence Maximization for Hyperparametric Models

no code implementations • 9 Mar 2019 • Dimitris Kalimeris, Gal Kaplun, Yaron Singer

A recent surging research direction in influence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.