Search Results for author: Preetum Nakkiran

Found 32 papers, 11 papers with code

LiDAR: Sensing Linear Probing Performance in Joint Embedding SSL Architectures

no code implementations7 Dec 2023 Vimal Thilak, Chen Huang, Omid Saremi, Laurent Dinh, Hanlin Goh, Preetum Nakkiran, Joshua M. Susskind, Etai Littwin

In this paper, we introduce LiDAR (Linear Discriminant Analysis Rank), a metric designed to measure the quality of representations within JE architectures.

Perspectives on the State and Future of Deep Learning -- 2023

no code implementations7 Dec 2023 Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson

The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time.

Benchmarking

Vanishing Gradients in Reinforcement Finetuning of Language Models

1 code implementation31 Oct 2023 Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations24 Oct 2023 Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing

1 code implementation21 Sep 2023 Jarosław Błasiok, Preetum Nakkiran

We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function.

When Does Optimizing a Proper Loss Yield Calibration?

no code implementations NeurIPS 2023 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated.

Loss Minimization Yields Multicalibration for Large Neural Networks

no code implementations19 Apr 2023 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran

We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$.

Fairness

A Unifying Theory of Distance from Calibration

no code implementations30 Nov 2022 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors.

APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations

no code implementations8 Oct 2022 Elan Rosenfeld, Preetum Nakkiran, Hadi Pouransari, Oncel Tuzel, Fartash Faghri

Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets.

Zero-Shot Learning

The Calibration Generalization Gap

1 code implementation5 Oct 2022 A. Michael Carrell, Neil Mallinar, James Lucas, Preetum Nakkiran

We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap.

Data Augmentation

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

no code implementations14 Jul 2022 Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime.

Learning Theory

Limitations of the NTK for Understanding Generalization in Deep Learning

no code implementations20 Jun 2022 Nikhil Vyas, Yamini Bansal, Preetum Nakkiran

The ``Neural Tangent Kernel'' (NTK) (Jacot et al 2018), and its empirical variants have been proposed as a proxy to capture certain behaviors of real neural networks.

What You See is What You Get: Principled Deep Learning via Distributional Generalization

1 code implementation7 Apr 2022 Bogdan Kulynych, Yao-Yuan Yang, Yaodong Yu, Jarosław Błasiok, Preetum Nakkiran

In contrast, we show that Differentially-Private (DP) training provably ensures the high-level WYSIWYG property, which we quantify using a notion of distributional generalization.

Knowledge Distillation: Bad Models Can Be Good Role Models

no code implementations28 Mar 2022 Gal Kaplun, Eran Malach, Preetum Nakkiran, Shai Shalev-Shwartz

We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data.

Knowledge Distillation Learning Theory

Deconstructing Distributions: A Pointwise Framework of Learning

1 code implementation20 Feb 2022 Gal Kaplun, Nikhil Ghosh, Saurabh Garg, Boaz Barak, Preetum Nakkiran

In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$.

Limitations of Neural Collapse for Understanding Generalization in Deep Learning

no code implementations17 Feb 2022 Like Hui, Mikhail Belkin, Preetum Nakkiran

We refine the Neural Collapse conjecture into two separate conjectures: collapse on the train set (an optimization property) and collapse on the test distribution (a generalization property).

Representation Learning

Turing-Universal Learners with Optimal Scaling Laws

no code implementations9 Nov 2021 Preetum Nakkiran

For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples.

Distributional Generalization: Structure Beyond Test Error

no code implementations29 Sep 2021 Preetum Nakkiran, Yamini Bansal

Classifiers in machine learning are often reduced to single dimensional quantities, such as test error or loss.

Revisiting Model Stitching to Compare Neural Representations

no code implementations NeurIPS 2021 Yamini Bansal, Preetum Nakkiran, Boaz Barak

We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks.

Self-Supervised Learning

Distributional Generalization: Characterizing Classifiers Beyond Test Error

no code implementations NeurIPS 2021 Preetum Nakkiran, Yamini Bansal

We present a new set of empirical properties of interpolating classifiers, including neural networks, kernel machines and decision trees.

Distributional Generalization: A New Kind of Generalization

1 code implementation17 Sep 2020 Preetum Nakkiran, Yamini Bansal

We introduce a new notion of generalization -- Distributional Generalization -- which roughly states that outputs of a classifier at train and test time are close *as distributions*, as opposed to close in just their average error.

2D Object Detection

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

no code implementations15 May 2020 Preetum Nakkiran

Learning rate schedule can significantly affect generalization performance in modern neural networks, but the reasons for this are not yet understood.

regression

Optimal Regularization Can Mitigate Double Descent

no code implementations ICLR 2021 Preetum Nakkiran, Prayaag Venkat, Sham Kakade, Tengyu Ma

Recent empirical and theoretical studies have shown that many learning algorithms -- from linear regression to neural networks -- can have test performance that is non-monotonic in quantities such the sample size and model size.

regression

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

1 code implementation16 Dec 2019 Preetum Nakkiran

In this expository note we describe a surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples.

regression

Deep Double Descent: Where Bigger Models and More Data Hurt

3 code implementations ICLR 2020 Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, Ilya Sutskever

We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

SGD on Neural Networks Learns Functions of Increasing Complexity

1 code implementation NeurIPS 2019 Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, Benjamin L. Edelman, Fred Zhang, Boaz Barak

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks.

Computational Limitations in Robust Classification and Win-Win Results

no code implementations4 Feb 2019 Akshay Degwekar, Preetum Nakkiran, Vinod Vaikuntanathan

We continue the study of statistical/computational tradeoffs in learning robust classifiers, following the recent work of Bubeck, Lee, Price and Razenshteyn who showed examples of classification tasks where (a) an efficient robust classifier exists, in the small-perturbation regime; (b) a non-robust classifier can be learned efficiently; but (c) it is computationally hard to learn a robust classifier, assuming the hardness of factoring large numbers.

Classification General Classification +1

Adversarial Robustness May Be at Odds With Simplicity

no code implementations2 Jan 2019 Preetum Nakkiran

}$ In this note, we show that this hypothesis is indeed possible, by giving several theoretical examples of classification tasks and sets of "simple" classifiers for which: (1) There exists a simple classifier with high standard accuracy, and also high accuracy under random $\ell_\infty$ noise.

Adversarial Robustness Classification +2

The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science

no code implementations14 Sep 2018 Preetum Nakkiran, Jarosław Błasiok

In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples.

Holdout Set

Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

1 code implementation19 Sep 2017 Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos

The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.