Search Results for author: Atri Rudra

Found 20 papers, 16 papers with code

Simple linear attention language models balance the recall-throughput tradeoff

1 code implementation28 Feb 2024 Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Ré

In this work, we explore whether we can improve language model efficiency (e. g. by reducing memory consumption) without compromising on recall.

Language Modelling Text Generation

Zoology: Measuring and Improving Recall in Efficient Language Models

2 code implementations8 Dec 2023 Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

3 code implementations28 Dec 2022 Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

Ranked #2 on Language Modelling on The Pile (Test perplexity metric)

8k Coreference Resolution +5

Arithmetic Circuits, Structured Matrices and (not so) Deep Learning

no code implementations24 Jun 2022 Atri Rudra

This survey presents a necessarily incomplete (and biased) overview of results at the intersection of arithmetic circuit complexity, structured matrices and deep learning.

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

1 code implementation24 Jun 2022 Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4).

Long-range modeling

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

9 code implementations27 May 2022 Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

16k 4k +3

Monarch: Expressive Structured Matrices for Efficient and Accurate Training

1 code implementation1 Apr 2022 Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré

To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms).

Language Modelling MRI Reconstruction

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

1 code implementation ICLR 2022 Tri Dao, Beidi Chen, Kaizhao Liang, Jiaming Yang, Zhao Song, Atri Rudra, Christopher Ré

To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices.

Language Modelling

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation

1 code implementation NeurIPS 2021 Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

Image Generation Language Modelling

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

2 code implementations NeurIPS 2021 Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.

Computational Efficiency Memorization +3

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

no code implementations NeurIPS 2021 Albert Gu, Isys Johnson, Karan Goel, Khaled Kamal Saab, Tri Dao, Atri Rudra, Christopher Re

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.

Computational Efficiency Memorization +3

Scatterbrain: Unifying Sparse and Low-rank Attention

1 code implementation NeurIPS 2021 Beidi Chen, Tri Dao, Eric Winsor, Zhao Song, Atri Rudra, Christopher Ré

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences.

Image Generation Language Modelling

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

2 code implementations ICLR 2020 Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré

Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps.

Image Classification speech-recognition +1

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

1 code implementation14 Mar 2019 Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré

Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions.

BIG-bench Machine Learning

Learning Compressed Transforms with Low Displacement Rank

1 code implementation NeurIPS 2018 Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré

The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual.

Image Classification Language Modelling

Hypertree Decompositions Revisited for PGMs

no code implementations2 Jul 2018 Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra

We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).

Hypertree Decompositions Revisited for PGMs

no code implementations5 Apr 2018 Aarthy Shivram Arun, Sai Vikneshwar Mani Jayaraman, Christopher Ré, Atri Rudra

We revisit the classical problem of exact inference on probabilistic graphical models (PGMs).

Cannot find the paper you are looking for? You can Submit a new open access paper.