no code implementations • 16 Oct 2023 • Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis
Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion.
no code implementations • 2 Oct 2023 • Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih
Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build.
5 code implementations • 29 Sep 2023 • Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis
In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a ``sink'' even if they are not semantically important.
1 code implementation • 27 Sep 2023 • Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
no code implementations • 17 Sep 2023 • Sean O'Brien, Mike Lewis
We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks.
2 code implementations • 11 Aug 2023 • Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis
We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions.
no code implementations • 24 May 2023 • Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih
Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations.
4 code implementations • 23 May 2023 • Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi
Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly.
5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy
Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.
no code implementations • NeurIPS 2023 • Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis
Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.
1 code implementation • 6 May 2023 • Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi
In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning.
1 code implementation • 24 Mar 2023 • Suchin Gururangan, Margaret Li, Mike Lewis, Weijia Shi, Tim Althoff, Noah A. Smith, Luke Zettlemoyer
Large language models are typically trained densely: all parameters are updated with respect to all inputs.
1 code implementation • 30 Jan 2023 • Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.
Ranked #9 on Question Answering on Natural Questions
2 code implementations • 29 Jan 2023 • Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Amjad Almahairi
We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models.
no code implementations • 15 Dec 2022 • Andrew Lee, David Wu, Emily Dinan, Mike Lewis
Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning.
no code implementations • 5 Dec 2022 • Sweta Agrawal, Chunting Zhou, Mike Lewis, Luke Zettlemoyer, Marjan Ghazvininejad
Large-scale generative models show an impressive ability to perform a wide range of Natural Language Processing (NLP) tasks using in-context learning, where a few examples are used to describe a task to the model.
1 code implementation • 2 Dec 2022 • Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases.
1 code implementation • 29 Nov 2022 • Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang
Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions.
Ranked #22 on Code Generation on MBPP
1 code implementation • Science 2022 • Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyan Hu, Athul Paul Jacob, Mojtaba Komeili, Karthik Konath, Minae Kwon, Adam Lerer, Mike Lewis, Alexander H. Miller, Sash Mitts, Aditya Renduchintala, Stephen Roller, Dirk Rowe, Weiyan Shi, Joe Spisak, Alexander Wei, David Wu, Hugh Zhang, Markus Zijlstra
Despite much progress in training AI systems to imitate human language, building agents that use language to communicate intentionally with humans in interactive environments remains a major challenge.
no code implementations • 22 Nov 2022 • Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih
To integrate knowledge in a more scalable and modular way, we propose a retrieval-augmented multimodal model, which enables a base multimodal model (generator) to refer to relevant text and images fetched by a retriever from external memory (e. g., documents on the web).
Ranked #7 on Image Captioning on MS COCO
no code implementations • 22 Nov 2022 • Weiyan Shi, Emily Dinan, Adi Renduchintala, Daniel Fried, Athul Paul Jacob, Zhou Yu, Mike Lewis
Existing approaches built separate classifiers to detect nonsense in dialogues.
2 code implementations • 27 Oct 2022 • Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, Mike Lewis
We propose contrastive decoding (CD), a reliable decoding approach that optimizes a contrastive objective subject to a plausibility constraint.
1 code implementation • 7 Oct 2022 • Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, Mike Lewis
We investigate the ability of language models to perform compositional reasoning tasks where the overall solution depends on correctly composing the answers to sub-problems.
Ranked #4 on Question Answering on Bamboogle
3 code implementations • 15 Aug 2022 • Tim Dettmers, Mike Lewis, Younes Belkada, Luke Zettlemoyer
We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance.
Ranked #2 on Language Modelling on C4
1 code implementation • 5 Aug 2022 • Margaret Li, Suchin Gururangan, Tim Dettmers, Mike Lewis, Tim Althoff, Noah A. Smith, Luke Zettlemoyer
New ELMs are learned by branching from (mixtures of) ELMs in the current set, further training the parameters on data for the new domain, and then merging the resulting model back into the set for future use.
1 code implementation • 21 Jun 2022 • Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer
We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
no code implementations • 7 Jun 2022 • Siddharth Dalmia, Dmytro Okhonko, Mike Lewis, Sergey Edunov, Shinji Watanabe, Florian Metze, Luke Zettlemoyer, Abdelrahman Mohamed
We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 9 May 2022 • Mandar Joshi, Terra Blevins, Mike Lewis, Daniel S. Weld, Luke Zettlemoyer
Creating labeled natural language training data is expensive and requires significant human effort.
1 code implementation • 15 Apr 2022 • Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer
We propose a simple and effective re-ranking method for improving passage retrieval in open question answering.
3 code implementations • 12 Apr 2022 • Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis
Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.
Ranked #39 on Code Generation on HumanEval (Pass@100 metric)
1 code implementation • 25 Feb 2022 • Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
Large language models (LMs) are able to in-context learn -- perform a new task via inference alone by conditioning on a few input-label pairs (demonstrations) and making predictions for new inputs.
no code implementations • 19 Jan 2022 • Armen Aghajanyan, Bernie Huang, Candace Ross, Vladimir Karpukhin, Hu Xu, Naman Goyal, Dmytro Okhonko, Mandar Joshi, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer
We introduce CM3, a family of causally masked generative models trained over a large corpus of structured multi-modal documents that can contain both text and image tokens.
2 code implementations • NAACL 2022 • Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi
We introduce MetaICL (Meta-training for In-Context Learning), a new meta-training framework for few-shot learning where a pretrained language model is tuned to do in-context learning on a large set of training tasks.
1 code implementation • NAACL 2022 • Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech
Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time.
no code implementations • NAACL 2022 • Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan
Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks.
2 code implementations • ICLR 2022 • Tim Dettmers, Mike Lewis, Sam Shleifer, Luke Zettlemoyer
To maintain stability and performance, we combine block-wise quantization with two additional changes: (1) dynamic quantization, a form of non-linear optimization that is precise for both large and small magnitude values, and (2) a stable embedding layer to reduce gradient variance that comes from the highly non-uniform distribution of input tokens in language models.
7 code implementations • ICLR 2022 • Ofir Press, Noah A. Smith, Mike Lewis
Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question has yet to be answered: how does a model achieve extrapolation at inference time for sequences that are longer than it saw during training?
2 code implementations • NAACL 2022 • Suchin Gururangan, Mike Lewis, Ari Holtzman, Noah A. Smith, Luke Zettlemoyer
We introduce a new domain expert mixture (DEMix) layer that enables conditioning a language model (LM) on the domain of the input text.
1 code implementation • ACL 2022 • Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer
We introduce a noisy channel approach for language model prompting in few-shot text classification.
no code implementations • ICLR 2022 • Armen Aghajanyan, Dmytro Okhonko, Mike Lewis, Mandar Joshi, Hu Xu, Gargi Ghosh, Luke Zettlemoyer
We introduce HTLM, a hyper-text language model trained on a large-scale web crawl.
Ranked #1 on Table-to-Text Generation on DART
1 code implementation • Findings (ACL) 2022 • Robin Jia, Mike Lewis, Luke Zettlemoyer
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations, motivated by the intuition that the representation of a phrase in a passage should encode all questions that the phrase can answer in context.
no code implementations • NAACL 2021 • Athul Paul Jacob, Mike Lewis, Jacob Andreas
When intelligent agents communicate to accomplish shared goals, how do these goals shape the agents' language?
1 code implementation • 30 Mar 2021 • Mike Lewis, Shruti Bhosale, Tim Dettmers, Naman Goyal, Luke Zettlemoyer
Sparse layers can dramatically improve the efficiency of training and inference by routing each token to specialized expert modules that contain only a small fraction of the model parameters.
1 code implementation • ACL 2021 • Ofir Press, Noah A. Smith, Mike Lewis
Increasing the input length has been a driver of progress in language modeling with transformers.
Ranked #26 on Language Modelling on WikiText-103
no code implementations • ACL 2021 • Michael Schlichtkrull, Vladimir Karpukhin, Barlas Oğuz, Mike Lewis, Wen-tau Yih, Sebastian Riedel
Structured information is an important knowledge source for automatic verification of factual claims.
5 code implementations • ICLR 2021 • Urvashi Khandelwal, Angela Fan, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
We introduce $k$-nearest-neighbor machine translation ($k$NN-MT), which predicts tokens with a nearest neighbor classifier over a large datastore of cached examples, using representations from a neural translation model for similarity search.
no code implementations • EMNLP 2020 • Armen Aghajanyan, Jean Maillard, Akshat Shrivastava, Keith Diedrick, Mike Haeger, Haoran Li, Yashar Mehdad, Ves Stoyanov, Anuj Kumar, Mike Lewis, Sonal Gupta
In this paper, we propose a semantic representation for such task-oriented conversational systems that can represent concepts such as co-reference and context carryover, enabling comprehensive understanding of queries in a session.
1 code implementation • EMNLP 2020 • Victor Zhong, Mike Lewis, Sida I. Wang, Luke Zettlemoyer
We propose Grounded Adaptation for Zero-shot Executable Semantic Parsing (GAZP) to adapt an existing semantic parser to new environments (e. g. new database schemas).
Ranked #6 on Text-To-SQL on SParC
2 code implementations • NeurIPS 2020 • Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer
The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.
9 code implementations • NeurIPS 2020 • Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela
Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks.
Ranked #4 on Question Answering on WebQuestions
2 code implementations • ACL 2020 • Alex Wang, Kyunghyun Cho, Mike Lewis
QAGS is based on the intuition that if we ask questions about a summary and its source, we will receive similar answers if the summary is factually consistent with the source.
5 code implementations • 22 Jan 2020 • Yinhan Liu, Jiatao Gu, Naman Goyal, Xi-An Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.
no code implementations • 9 Nov 2019 • Siddharth Dalmia, Abdel-rahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer
Inspired by modular software design principles of independence, interchangeability, and clarity of interface, we introduce a method for enforcing encoder-decoder modularity in seq2seq models without sacrificing the overall model quality or its full differentiability.
5 code implementations • ICLR 2020 • Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis
Applying this augmentation to a strong Wikitext-103 LM, with neighbors drawn from the original training set, our $k$NN-LM achieves a new state-of-the-art perplexity of 15. 79 - a 2. 9 point improvement with no additional training.
Ranked #10 on Language Modelling on WikiText-103
no code implementations • IJCNLP 2019 • Panupong Pasupat, Sonal Gupta, M, Karishma yam, Rushin Shah, Mike Lewis, Luke Zettlemoyer
We propose a semantic parser for parsing compositional utterances into Task Oriented Parse (TOP), a tree representation that has intents and slots as labels of nesting tree nodes.
42 code implementations • ACL 2020 • Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdel-rahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Ranked #3 on Open-Domain Question Answering on ELI5
58 code implementations • 26 Jul 2019 • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
Ranked #1 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (Wasserstein Distance (WD) metric, using extra training data)
5 code implementations • 4 Jun 2019 • Sean Vasquez, Mike Lewis
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps.
1 code implementation • NeurIPS 2019 • Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis
We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making.
no code implementations • ICLR 2019 • Mike Lewis, Angela Fan
Discriminative question answering models can overfit to superficial biases in datasets, because their loss function saturates when any clue makes the answer likely.
no code implementations • 15 Feb 2019 • Arash Einolghozati, Panupong Pasupat, Sonal Gupta, Rushin Shah, Mrinal Mohit, Mike Lewis, Luke Zettlemoyer
Semantic parsing using hierarchical representations has recently been proposed for task oriented dialog with promising results [Gupta et al 2018].
no code implementations • ACL 2019 • Angela Fan, Mike Lewis, Yann Dauphin
Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right.
no code implementations • NAACL 2019 • Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis
We use this data set to evaluate three different cross-lingual transfer methods: (1) translating the training data, (2) using cross-lingual pre-trained embeddings, and (3) a novel method of using a multilingual machine translation encoder as contextual word representations.
no code implementations • EMNLP 2018 • Sonal Gupta, Rushin Shah, Mrinal Mohit, Anuj Kumar, Mike Lewis
Task oriented dialog systems typically first parse user utterances to semantic frames comprised of intents and slots.
no code implementations • EMNLP 2018 • Sp Gella, ana, Mike Lewis, Marcus Rohrbach
Video content on social media platforms constitutes a major part of the communication between people, as it allows everyone to share their stories.
no code implementations • EMNLP 2018 • Nitish Gupta, Mike Lewis
Answering compositional questions requiring multi-step reasoning is challenging.
1 code implementation • 10 Aug 2018 • Akshat Agarwal, Swaminathan Gurumurthy, Vasu Sharma, Mike Lewis, Katia Sycara
The task of conducting visually grounded dialog involves learning goal-oriented cooperative dialog between autonomous agents who exchange information about a scene through several rounds of questions and answers in natural language.
7 code implementations • ACL 2018 • Angela Fan, Mike Lewis, Yann Dauphin
We explore story generation: creative systems that can build coherent and fluent passages of text about a topic.
1 code implementation • ICML 2018 • Denis Yarats, Mike Lewis
End-to-end models for goal-orientated dialogue are challenging to train, because linguistic and strategic aspects are entangled in latent state vectors.
no code implementations • EMNLP 2017 • Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, Dhruv Batra
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.
4 code implementations • EMNLP 2017 • Kenton Lee, Luheng He, Mike Lewis, Luke Zettlemoyer
We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector.
Ranked #15 on Coreference Resolution on CoNLL 2012
no code implementations • ACL 2017 • Alane Suhr, Mike Lewis, James Yeh, Yoav Artzi
We present a new visual reasoning language dataset, containing 92, 244 pairs of examples of natural statements grounded in synthetic images with 3, 962 unique sentences.
1 code implementation • ACL 2017 • Luheng He, Kenton Lee, Mike Lewis, Luke Zettlemoyer
We introduce a new deep learning model for semantic role labeling (SRL) that significantly improves the state of the art, along with detailed analyses to reveal its strengths and limitations.
Ranked #2 on Predicate Detection on CoNLL 2005
2 code implementations • 16 Jun 2017 • Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra
Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions.
1 code implementation • EMNLP 2016 • Kenton Lee, Mike Lewis, Luke Zettlemoyer
We introduce the first global recursive neural parsing model with optimality guarantees during decoding.
no code implementations • TACL 2014 • Mike Lewis, Mark Steedman
Current supervised parsers are limited by the size of their labelled training data, making improving them with unlabelled data an important goal.
no code implementations • TACL 2013 • Mike Lewis, Mark Steedman
We introduce a new approach to semantics which combines the benefits of distributional and formal logical semantics.