Search Results for author: Abhishek Panigrahi

Found 12 papers, 4 papers with code

Efficient Stagewise Pretraining via Progressive Subnetworks

no code implementations • 8 Feb 2024 • Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu, Sobhan Miryoosefi, Sashank Reddi, Satyen Kale, Sanjiv Kumar

RaPTr achieves better pre-training loss for BERT and UL2 language models while requiring 20-33% fewer FLOPs compared to standard training, and is competitive or better than other efficient training methods.

Paper
Add Code

Trainable Transformer in Transformer

1 code implementation • 3 Jul 2023 • Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora

In this work, we propose an efficient construction, Transformer in Transformer (in short, TinT), that allows a transformer to simulate and fine-tune complex models internally during inference (e. g., pre-trained language models).

Attribute In-Context Learning +1

Paper
Code

Do Transformers Parse while Predicting the Masked Word?

no code implementations • 14 Mar 2023 • Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data.

Constituency Parsing Language Modelling +1

Paper
Add Code

Task-Specific Skill Localization in Fine-tuned Language Models

1 code implementation • 13 Feb 2023 • Abhishek Panigrahi, Nikunj Saunshi, Haoyu Zhao, Sanjeev Arora

Given the downstream task and a model fine-tuned on that task, a simple optimization is used to identify a very small subset of parameters ($\sim0. 01$% of model parameters) responsible for ($>95$%) of the model's performance, in the sense that grafting the fine-tuned values for just this tiny subset onto the pre-trained model gives performance almost as well as the fine-tuned model.

Continual Learning

Paper
Code

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

1 code implementation • 20 May 2022 • Sadhika Malladi, Kaifeng Lyu, Abhishek Panigrahi, Sanjeev Arora

Approximating Stochastic Gradient Descent (SGD) as a Stochastic Differential Equation (SDE) has allowed researchers to enjoy the benefits of studying a continuous optimization trajectory while carefully preserving the stochasticity of SGD.

Paper
Code

Understanding Gradient Descent on Edge of Stability in Deep Learning

no code implementations • 19 May 2022 • Sanjeev Arora, Zhiyuan Li, Abhishek Panigrahi

The current paper mathematically analyzes a new mechanism of implicit regularization in the EoS phase, whereby GD updates due to non-smooth loss landscape turn out to evolve along some deterministic flow on the manifold of minimum loss.

Paper
Add Code

Learning and Generalization in RNNs

no code implementations • NeurIPS 2021 • Abhishek Panigrahi, Navin Goyal

In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions.

Paper
Add Code

Non-Gaussianity of Stochastic Gradient Noise

no code implementations • 21 Oct 2019 • Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli

What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training?

Paper
Add Code

Effect of Activation Functions on the Training of Overparametrized Neural Nets

no code implementations • ICLR 2020 • Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks.

Small Data Image Classification

Paper
Add Code

Word2Sense: Sparse Interpretable Word Embeddings

no code implementations • ACL 2019 • Abhishek Panigrahi, Harsha Vardhan Simhadri, Chiranjib Bhattacharyya

We present an unsupervised method to generate Word2Sense word embeddings that are interpretable {---} each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of the embedding along the j-th dimension represents the relevance of the j-th sense to the word.

Word Embeddings Word Similarity

Paper
Add Code

DeepTagRec: A Content-cum-User based Tag Recommendation Framework for Stack Overflow

1 code implementation • 10 Mar 2019 • Suman Kalyan Maity, Abhishek Panigrahi, Sayan Ghosh, Arundhati Banerjee, Pawan Goyal, Animesh Mukherjee

In this paper, we develop a content-cum-user based deep learning framework DeepTagRec to recommend appropriate question tags on Stack Overflow.

TAG

Paper
Code

Analysis on Gradient Propagation in Batch Normalized Residual Networks

no code implementations • ICLR 2018 • Abhishek Panigrahi, Yueru Chen, C. -C. Jay Kuo

We conduct mathematical analysis on the effect of batch normalization (BN) on gradient backpropogation in residual network training, which is believed to play a critical role in addressing the gradient vanishing/explosion problem, in this work.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.