1 code implementation • 27 Sep 2023 • Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma
We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences.
14 code implementations • 18 Jul 2023 • Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Ranked #2 on Question Answering on PubChemQA
1 code implementation • 23 May 2023 • Prajjwal Bhargava, Rohan Chitnis, Alborz Geramifard, Shagun Sodhani, Amy Zhang
Three popular algorithms for offline RL are Conservative Q-Learning (CQL), Behavior Cloning (BC), and Decision Transformer (DT), from the class of Q-Learning, Imitation Learning, and Sequence Modeling respectively.
no code implementations • 10 Mar 2023 • Prajjwal Bhargava, Pooyan Amini, Shahin Shayandeh, Chinnadhurai Sankar
As large dialogue models become commonplace in practice, the problems surrounding high compute requirements for training, inference and larger memory footprint still persists.
1 code implementation • 22 Oct 2022 • Prajjwal Bhargava, Vincent Ng
We present DiscoSense, a benchmark for commonsense reasoning via understanding a wide variety of discourse connectives.
Ranked #8 on Sentence Completion on HellaSwag
no code implementations • 28 Jan 2022 • Prajjwal Bhargava, Vincent Ng
While commonsense knowledge acquisition and reasoning has traditionally been a core research topic in the knowledge representation and reasoning community, recent years have seen a surge of interest in the natural language processing community in developing pre-trained models and testing their ability to address a variety of newly designed commonsense knowledge reasoning and generation tasks.
1 code implementation • EMNLP (insights) 2021 • Prajjwal Bhargava, Aleksandr Drozd, Anna Rogers
Much of recent progress in NLU was shown to be due to models' learning dataset-specific heuristics.
2 code implementations • ACL 2020 • Prajjwal Bhargava
The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations.
1 code implementation • 28 Sep 2019 • Prajjwal Bhargava
We address the problem of incremental learning in object detection on the India Driving Dataset (IDD).
Ranked #2 on Object Detection on BDD100K val
1 code implementation • 20 Aug 2018 • Prajjwal Bhargava
In this paper, we propose a model that can be used for multiple tasks in Person Re-Identification, provide state-of-the-art results on a variety of tasks and still achieve considerable accuracy subsequently.
Ranked #69 on Person Re-Identification on DukeMTMC-reID