no code implementations • ICLR 2019 • Ryan Y. Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko
In this paper we introduce the Scratchpad Encoder, a novel addition to the sequence to sequence (seq2seq) framework and explore its effectiveness in generating natural language questions from a given logical form.
1 code implementation • 7 Dec 2023 • Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa
We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases.
1 code implementation • 7 Dec 2023 • Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, Madian Khabsa
Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.
no code implementations • 13 Nov 2023 • Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao
Specifically, an adversarial LLM and a target LLM interplay with each other in an iterative manner, where the adversarial LLM aims to generate challenging prompts that elicit unsafe responses from the target LLM, while the target LLM is fine-tuned with safety aligned data on these adversarial prompts.
no code implementations • 29 Sep 2023 • Xiaotian Han, Hanqing Zeng, Yu Chen, Shaoliang Nie, Jingzhou Liu, Kanika Narang, Zahra Shakeri, Karthik Abinav Sankararaman, Song Jiang, Madian Khabsa, Qifan Wang, Xia Hu
We establish this equivalence mathematically by demonstrating that graph convolution networks (GCN) and simplified graph convolution (SGC) can be expressed as a form of Mixup.
1 code implementation • 27 Sep 2023 • Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
1 code implementation • 31 Aug 2023 • Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa
We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs).
14 code implementations • 18 Jul 2023 • Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Ranked #2 on Question Answering on PubChemQA
1 code implementation • 6 May 2023 • Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi
In this work, we introduce Residual Prompt Tuning - a simple and efficient method that significantly improves the performance and stability of prompt tuning.
no code implementations • 28 Apr 2023 • Yuchen Liu, Natasha Ong, Kaiyan Peng, Bo Xiong, Qifan Wang, Rui Hou, Madian Khabsa, Kaiyue Yang, David Liu, Donald S. Williamson, Hanchao Yu
Our model encodes different views of the input signal and builds several channel-resolution feature stages to process the multiple views of the input at different resolutions in parallel.
no code implementations • 1 Apr 2023 • Chenbin Pan, Rui Hou, Hanchao Yu, Qifan Wang, Senem Velipasalar, Madian Khabsa
Whether by processing videos with fixed resolution from start to end or incorporating pooling and down-scaling strategies, existing video transformers process the whole video content throughout the network without specially handling the large portions of redundant information.
2 code implementations • 29 Jan 2023 • Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Amjad Almahairi
We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models.
2 code implementations • 25 Jan 2023 • Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa
Large multilingual language models typically rely on a single vocabulary shared across 100+ languages.
no code implementations • 10 Dec 2022 • Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas, Madian Khabsa, Amjad Almahairi
Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining.
no code implementations • 25 May 2022 • Suzanna Sia, Anton Belyy, Amjad Almahairi, Madian Khabsa, Luke Zettlemoyer, Lambert Mathias
Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors.
no code implementations • 27 Dec 2021 • Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz
In this paper, we focus on teasing out what parts of the language supervision are essential for training zero-shot image classification models.
2 code implementations • NAACL 2022 • Belinda Z. Li, Jane Yu, Madian Khabsa, Luke Zettlemoyer, Alon Halevy, Jacob Andreas
When a neural language model (LM) is adapted to perform a new task, what aspects of the task predict the eventual performance of the model?
1 code implementation • NAACL 2022 • Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech
Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time.
1 code implementation • ACL 2022 • Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa
Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited.
3 code implementations • 29 Apr 2021 • Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma
Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners.
Ranked #1 on Topic Classification on OS
no code implementations • EMNLP 2021 • Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa
Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline.
1 code implementation • NAACL 2021 • Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa
In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup.
no code implementations • NAACL 2021 • Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung
Through experiments, we empirically verify the plausibility of the rather surprising usage of the perplexity score in the context of fact-checking and highlight the strength of our few-shot methodology by comparing it to strong fine-tuning-based baseline models.
no code implementations • 31 Dec 2020 • Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, Hao Ma
Pre-trained language models have proven their unique powers in capturing implicit language features.
Ranked #5 on Question Answering on Quora Question Pairs
no code implementations • 31 Dec 2020 • Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa
Thus, our policy packs task-relevant knowledge into the parameters of a language model.
no code implementations • ACL 2020 • Sinong Wang, Madian Khabsa, Hao Ma
Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.
no code implementations • 15 Jun 2020 • Sinong Wang, Madian Khabsa, Hao Ma
Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks.
15 code implementations • 8 Jun 2020 • Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications.
no code implementations • WS 2020 • Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, Madian Khabsa
Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data.
no code implementations • ACL 2019 • Ryan Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko
We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks.
1 code implementation • 12 Jun 2019 • Ryan Y. Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko
We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks.
no code implementations • 22 Apr 2018 • Xiao Yang, Miaosen Wang, Wei Wang, Madian Khabsa, Ahmed Awadallah, Daniel Kifer, C. Lee Giles
We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue.
no code implementations • 26 Dec 2017 • Chu-Cheng Lin, Dongyeop Kang, Michael Gamon, Madian Khabsa, Ahmed Hassan Awadallah, Patrick Pantel
Emails in the workplace are often intentional calls to action for its recipients.