Search Results for author: Susan Zhang

Found 9 papers, 4 papers with code

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

1 code implementation • 5 Sep 2023 • Lili Yu, Bowen Shi, Ramakanth Pasunuru, Benjamin Muller, Olga Golovneva, Tianlu Wang, Arun Babu, Binh Tang, Brian Karrer, Shelly Sheynin, Candace Ross, Adam Polyak, Russell Howes, Vasu Sharma, Puxin Xu, Hovhannes Tamoyan, Oron Ashual, Uriel Singer, Shang-Wen Li, Susan Zhang, Richard James, Gargi Ghosh, Yaniv Taigman, Maryam Fazel-Zarandi, Asli Celikyilmaz, Luke Zettlemoyer, Armen Aghajanyan

It is also a general-purpose model that can do both text-to-image and image-to-text generation, allowing us to introduce self-contained contrastive decoding methods that produce high-quality outputs.

Ranked #2 on Text-to-Image Generation on MS COCO

Decoder Language Modelling +3

320

Paper
Code

LIMA: Less Is More for Alignment

5 code implementations • NeurIPS 2023 • Chunting Zhou, PengFei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences.

Language Modelling reinforcement-learning

2,533

Paper
Code

A Theory on Adam Instability in Large-Scale Machine Learning

no code implementations • 19 Apr 2023 • Igor Molybog, Peter Albert, Moya Chen, Zachary DeVito, David Esiobu, Naman Goyal, Punit Singh Koura, Sharan Narang, Andrew Poulton, Ruan Silva, Binh Tang, Diana Liskovich, Puxin Xu, Yuchen Zhang, Melanie Kambadur, Stephen Roller, Susan Zhang

We present a theory for the previously unexplained divergent behavior noticed in the training of large language models.

Language Modelling

Paper
Add Code

Effective Theory of Transformers at Initialization

no code implementations • 4 Apr 2023 • Emily Dinan, Sho Yaida, Susan Zhang

We perform an effective-theory analysis of forward-backward signal propagation in wide and deep Transformers, i. e., residual neural networks with multi-head self-attention blocks and multilayer perceptron blocks.

Paper
Add Code

Scaling Laws for Generative Mixed-Modal Language Models

no code implementations • 10 Jan 2023 • Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens.

Paper
Add Code

OPT: Open Pre-trained Transformer Language Models

7 code implementations • 2 May 2022 • Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shuster, Daniel Simig, Punit Singh Koura, Anjali Sridhar, Tianlu Wang, Luke Zettlemoyer

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Ranked #2 on Stereotypical Bias Analysis on CrowS-Pairs

Decoder Hate Speech Detection +2

6,388

Paper
Code

Dota 2 with Large Scale Deep Reinforcement Learning

1 code implementation • 13 Dec 2019 • Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, Susan Zhang

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game.

Dota 2 reinforcement-learning +1

399

Paper
Code

Long-Term Planning and Situational Awareness in OpenAI Five

no code implementations • 13 Dec 2019 • Jonathan Raiman, Susan Zhang, Filip Wolski

Understanding how knowledge about the world is represented within model-free deep reinforcement learning methods is a major challenge given the black box nature of its learning process within high-dimensional observation and action spaces.

Dota 2

Paper
Add Code

Neural Network Surgery with Sets

no code implementations • 13 Dec 2019 • Jonathan Raiman, Susan Zhang, Christy Dennison

The cost to train machine learning models has been increasing exponentially, making exploration and research into the correct features and architecture a costly or intractable endeavor at scale.

Dota 2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.