no code implementations • 8 Feb 2024 • Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar
RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.
1 code implementation • 3 Aug 2023 • Vedant Gaur, Nikunj Saunshi
Large language models (LLMs) have revolutionized NLP by solving downstream tasks with little to no labeled data.
1 code implementation • 13 Feb 2023 • Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora
Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.
1 code implementation • 5 Nov 2022 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora
Saliency methods compute heat maps that highlight portions of an input that were most {\em important} for the label assigned to it by a deep net.
no code implementations • 3 Oct 2022 • Nikunj Saunshi, Arushi Gupta, Mark Braverman, Sanjeev Arora
Influence functions estimate effect of individual data points on predictions of the model on test data and were adapted to deep learning in Koh and Liang [2017].
no code implementations • 28 Feb 2022 • Nikunj Saunshi, Jordan Ash, Surbhi Goel, Dipendra Misra, Cyril Zhang, Sanjeev Arora, Sham Kakade, Akshay Krishnamurthy
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs.
no code implementations • ICLR 2022 • Yi Zhang, Arushi Gupta, Nikunj Saunshi, Sanjeev Arora
Research on generalization bounds for deep networks seeks to give ways to predict test error using just the training dataset and the network parameters.
no code implementations • 29 Sep 2021 • Arushi Gupta, Nikunj Saunshi, Dingli Yu, Kaifeng Lyu, Sanjeev Arora
Saliency methods seek to provide human-interpretable explanations for the output of machine learning model on a given input.
1 code implementation • 29 Jun 2021 • Nikunj Saunshi, Arushi Gupta, Wei Hu
An effective approach in meta-learning is to utilize multiple "train tasks" to learn a good initialization for model parameters that can help solve unseen "test tasks" with very few samples by fine-tuning from this initialization.
no code implementations • ICLR 2021 • Nikunj Saunshi, Sadhika Malladi, Sanjeev Arora
This paper initiates a mathematical study of this phenomenon for the downstream task of text classification by considering the following questions: (1) What is the intuitive connection between the pretraining task of next word prediction and text classification?
no code implementations • NeurIPS 2021 • Jason D. Lee, Qi Lei, Nikunj Saunshi, Jiacheng Zhuo
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data to learn useful semantic representations.
no code implementations • ICML 2020 • Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora
In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of $\mathcal{O}(1)$, demonstrating a separation from convex meta-learning.
no code implementations • ICML 2020 • Sanjeev Arora, Simon S. Du, Sham Kakade, Yuping Luo, Nikunj Saunshi
We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters.
no code implementations • 25 Feb 2019 • Sanjeev Arora, Hrishikesh Khandeparkar, Mikhail Khodak, Orestis Plevrakis, Nikunj Saunshi
This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes.
1 code implementation • ACL 2018 • Mikhail Khodak, Nikunj Saunshi, YIngyu Liang, Tengyu Ma, Brandon Stewart, Sanjeev Arora
Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features.
Ranked #3 on Sentiment Analysis on MPQA
2 code implementations • ICLR 2018 • Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice.
6 code implementations • LREC 2018 • Mikhail Khodak, Nikunj Saunshi, Kiran Vodrahalli
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection.
Ranked #1 on Sarcasm Detection on SARC (pol-unbal)