Search Results for author: Navin Goyal

Found 30 papers, 11 papers with code

Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

no code implementations25 Apr 2024 Kabir Ahuja, Vidhisha Balachandran, Madhur Panwar, Tianxing He, Noah A. Smith, Navin Goyal, Yulia Tsvetkov

Transformers trained on natural language data have been shown to learn its hierarchical structure and generalize to sentences with unseen syntactic structures without explicitly encoding any structural bias.

Inductive Bias Language Modelling

In-Context Learning through the Bayesian Prism

1 code implementation8 Jun 2023 Madhur Panwar, Kabir Ahuja, Navin Goyal

One of the main discoveries in this line of research has been that for several function classes, such as linear regression, transformers successfully generalize to new functions in the class.

Bayesian Inference In-Context Learning +4

A Theory of Emergent In-Context Learning as Implicit Structure Induction

no code implementations14 Mar 2023 Michael Hahn, Navin Goyal

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations.

In-Context Learning

Towards a Mathematics Formalisation Assistant using Large Language Models

no code implementations14 Nov 2022 Ayush Agrawal, Siddhartha Gadgil, Navin Goyal, Ashvni Narayanan, Anand Tadipatri

Mathematics formalisation is the task of writing mathematics (i. e., definitions, theorem statements, proofs) in natural language, as found in books and papers, into a formal language that can then be checked for correctness by a program.

Language Modelling Large Language Model

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

1 code implementation23 Oct 2022 Ankur Sikarwar, Arkil Patel, Navin Goyal

On analyzing the task, we find that identifying the target location in the grid world is the main challenge for the models.

Revisiting the Compositional Generalization Abilities of Neural Sequence Models

1 code implementation ACL 2022 Arkil Patel, Satwik Bhattamishra, Phil Blunsom, Navin Goyal

Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences.

Learning and Generalization in Overparameterized Normalizing Flows

1 code implementation19 Jun 2021 Kulin Shah, Amit Deshpande, Navin Goyal

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with a sufficiently small learning rate and suitable initialization.

Density Estimation

Learning and Generalization in RNNs

no code implementations NeurIPS 2021 Abhishek Panigrahi, Navin Goyal

In contrast to the previous work that could only deal with functions of sequences that are sums of functions of individual tokens in the sequence, we allow general functions.

Analyzing the Nuances of Transformers' Polynomial Simplification Abilities

no code implementations29 Apr 2021 Vishesh Agarwal, Somak Aditya, Navin Goyal

To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task.

Are NLP Models really able to Solve Simple Math Word Problems?

3 code implementations NAACL 2021 Arkil Patel, Satwik Bhattamishra, Navin Goyal

Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are often considered "solved" with the bulk of research attention moving to more complex MWPs.

Math Math Word Problem Solving +1

Do Transformers Understand Polynomial Simplification?

no code implementations1 Jan 2021 Vishesh Agarwal, Somak Aditya, Navin Goyal

For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i. e., in the normal form) polynomial.

Learning and Generalization in Univariate Overparameterized Normalizing Flows

no code implementations1 Jan 2021 Kulin Shah, Amit Deshpande, Navin Goyal

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using Stochastic Gradient Descent (SGD).

Density Estimation

On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages

1 code implementation COLING 2020 Satwik Bhattamishra, Kabir Ahuja, Navin Goyal

We find that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the test strings are longer.

On the Ability and Limitations of Transformers to Recognize Formal Languages

1 code implementation EMNLP 2020 Satwik Bhattamishra, Kabir Ahuja, Navin Goyal

Our analysis also provides insights on the role of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities of the model.

Robust Identifiability in Linear Structural Equation Models of Causal Inference

no code implementations14 Jul 2020 Karthik Abinav Sankararaman, Anand Louis, Navin Goyal

First, for a large and well-studied class of LSEMs, namely ``bow free'' models, we provide a sufficient condition on model parameters under which robust identifiability holds, thereby removing the restriction of paths required by prior work.

Causal Inference

Non-Gaussianity of Stochastic Gradient Noise

no code implementations21 Oct 2019 Abhishek Panigrahi, Raghav Somani, Navin Goyal, Praneeth Netrapalli

What enables Stochastic Gradient Descent (SGD) to achieve better generalization than Gradient Descent (GD) in Neural Network training?

Effect of Activation Functions on the Training of Overparametrized Neural Nets

no code implementations ICLR 2020 Abhishek Panigrahi, Abhishek Shetty, Navin Goyal

In the present paper, we provide theoretical results about the effect of activation function on the training of highly overparametrized 2-layer neural networks.

Small Data Image Classification

Universality Patterns in the Training of Neural Networks

no code implementations17 May 2019 Raghav Somani, Navin Goyal, Prateek Jain, Praneeth Netrapalli

This paper proposes and demonstrates a surprising pattern in the training of neural networks: there is a one to one relation between the values of any pair of losses (such as cross entropy, mean squared error, 0/1 error etc.)

Stability of Linear Structural Equation Models of Causal Inference

no code implementations16 May 2019 Karthik Abinav Sankararaman, Anand Louis, Navin Goyal

First we prove that under a sufficient condition, for a certain sub-class of $\LSEM$ that are \emph{bow-free} (Brito and Pearl (2002)), the parameter recovery is stable.

Causal Inference Sociology

Non-Gaussian Component Analysis using Entropy Methods

no code implementations13 Jul 2018 Navin Goyal, Abhishek Shetty

NGCA is also related to dimension reduction and to other data analysis problems such as ICA.

Dimensionality Reduction

Depth separation and weight-width trade-offs for sigmoidal neural networks

no code implementations ICLR 2018 Amit Deshpande, Navin Goyal, Sushrut Karmalkar

We show a similar separation between the expressive power of depth-2 and depth-3 sigmoidal neural networks over a large class of input distributions, as long as the weights are polynomially bounded.

Learnability of Learned Neural Networks

no code implementations ICLR 2018 Rahul Anand Sharma, Navin Goyal, Monojit Choudhury, Praneeth Netrapalli

This paper explores the simplicity of learned neural networks under various settings: learned on real vs random data, varying size/architecture and using large minibatch size vs small minibatch size.

Heavy-Tailed Analogues of the Covariance Matrix for ICA

no code implementations22 Feb 2017 Joseph Anderson, Navin Goyal, Anupama Nandi, Luis Rademacher

Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix).

Heavy-tailed Independent Component Analysis

no code implementations2 Sep 2015 Joseph Anderson, Navin Goyal, Anupama Nandi, Luis Rademacher

Independent component analysis (ICA) is the problem of efficiently recovering a matrix $A \in \mathbb{R}^{n\times n}$ from i. i. d.

The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures

no code implementations12 Nov 2013 Joseph Anderson, Mikhail Belkin, Navin Goyal, Luis Rademacher, James Voss

The problem of learning this map can be efficiently solved using some recent results on tensor decompositions and Independent Component Analysis (ICA), thus giving an algorithm for recovering the mixture.

Fourier PCA and Robust Tensor Decomposition

1 code implementation25 Jun 2013 Navin Goyal, Santosh Vempala, Ying Xiao

Fourier PCA is Principal Component Analysis of a matrix obtained from higher order derivatives of the logarithm of the Fourier transform of a distribution. We make this method algorithmic by developing a tensor decomposition method for a pair of tensors sharing the same vectors in rank-$1$ decompositions.

Tensor Decomposition

Efficient learning of simplices

no code implementations9 Nov 2012 Joseph Anderson, Navin Goyal, Luis Rademacher

We also show a direct connection between the problem of learning a simplex and ICA: a simple randomized reduction to ICA from the problem of learning a simplex.

Thompson Sampling for Contextual Bandits with Linear Payoffs

1 code implementation15 Sep 2012 Shipra Agrawal, Navin Goyal

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.