GPT-2

Introduced by Radford et al. in Language Models are Unsupervised Multitask Learners

GPT-2 is a Transformer architecture that was notable for its size (1.5 billion parameters) on its release. The model is pretrained on a WebText dataset - text from 45 million website links. It largely follows the previous GPT architecture with some modifications:

  • Layer normalization is moved to the input of each sub-block, similar to a pre-activation residual network and an additional layer normalization was added after the final self-attention block.

  • A modified initialization which accounts for the accumulation on the residual path with model depth is used. Weights of residual layers are scaled at initialization by a factor of $1/\sqrt{N}$ where $N$ is the number of residual layers.

  • The vocabulary is expanded to 50,257. The context size is expanded from 512 to 1024 tokens and a larger batch size of 512 is used.

Source: Language Models are Unsupervised Multitask Learners

Latest Papers

PAPER DATE
Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering
Jeroen OfferijnsSuzan VerberneTessa Verhoef
2020-10-19
Decoding Methods for Neural Narrative Generation
| Alexandra DeLuciaAaron MuellerXiang Lisa LiJoão Sedoc
2020-10-14
The workweek is the best time to start a family -- A Study of GPT-2 Based Claim Generation
Shai GretzYonatan BiluEdo Cohen-KarlikNoam Slonim
2020-10-13
Meta-Context Transformers for Domain-Specific Response Generation
Debanjana KarSuranjana SamantaAmar Prakash Azad
2020-10-12
Incremental Processing in the Age of Non-Incremental Encoders: An Empirical Assessment of Bidirectional Models for Incremental NLU
| Brielen MadureiraDavid Schlangen
2020-10-11
Investigating African-American Vernacular English in Transformer-Based Text Generation
Sophie GroenwoldLily OuAesha ParekhSamhita HonnavalliSharon LevyDiba MirzaWilliam Yang Wang
2020-10-06
GenAug: Data Augmentation for Finetuning Text Generators
Steven Y. FengVarun GangalDongyeop KangTeruko MitamuraEduard Hovy
2020-10-05
Inquisitive Question Generation for High Level Text Comprehension
Wei-Jen KoTe-Yuan ChenYiyan HuangGreg DurrettJunyi Jessy Li
2020-10-04
Examining the rhetorical capacities of neural language models
Zining ZhuChuer PanMohamed AbdallaFrank Rudzicz
2020-10-01
Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
| Peter A. Jansen
2020-09-29
The design and implementation of Language Learning Chatbot with XAI using Ontology and Transfer Learning
Nuobei ShiQin ZengRaymond Lee
2020-09-29
On Data Augmentation for Extreme Multi-label Classification
Danqing ZhangTao LiHaiyang ZhangBing Yin
2020-09-22
Prior Art Search and Reranking for Generated Patent Text
Jieh-Sheng LeeJieh Hsiang
2020-09-19
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Kris McGuffieAlex Newhouse
2020-09-15
Dialogue Response Ranking Training with Large-Scale Human Feedback Data
| Xiang GaoYizhe ZhangMichel GalleyChris BrockettBill Dolan
2020-09-15
Critical Thinking for Language Models
Gregor Betz
2020-09-15
GeDi: Generative Discriminator Guided Sequence Generation
Ben KrauseAkhilesh Deepak GotmareBryan McCannNitish Shirish KeskarShafiq JotyRichard SocherNazneen Fatema Rajani
2020-09-14
Brain2Word: Decoding Brain Activity for Language Generation
Nicolas AffolterBeni EgressyDamian PascualRoger Wattenhofer
2020-09-10
Modern Methods for Text Generation
| Dimas Munoz Montesinos
2020-09-10
Improving Language Generation with Sentence Coherence Objective
Ruixiao SunJie YangMehrdad Yousefzadeh
2020-09-07
Black Box to White Box: Discover Model Characteristics Based on Strategic Probing
Josh KalinMatthew CiolinoDavid NoeverGerry Dozier
2020-09-07
Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading
Sasi Kiran GaddipatiDeebul NairPaul G. Plöger
2020-09-02
DAVE: Deriving Automatically Verilog from English
Hammond PearceBenjamin TanRamesh Karri
2020-08-27
Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study
Dara BahriYi TayChe ZhengDonald MetzlerCliff BrunkAndrew Tomkins
2020-08-17
Narrative Interpolation for Generating and Understanding Stories
Su WangGreg DurrettKatrin Erk
2020-08-17
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Davis YoshidaAllyson EttingerKevin Gimpel
2020-08-16
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
Andrea Madotto
2020-08-14
Navigating Language Models with Synthetic Agents
Philip Feldman
2020-08-10
Trojaning Language Models for Fun and Profit
Xinyang ZhangZheng ZhangTing Wang
2020-08-01
Multi-node Bert-pretraining: Cost-efficient Approach
Jiahuang LinXin LiGennady Pekhimenko
2020-08-01
TweepFake: about Detecting Deepfake Tweets
Tiziano FagniFabrizio FalchiMargherita GambiniAntonio MartellaMaurizio Tesconi
2020-07-31
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
TJ TsaiKevin Ji
2020-07-29
Generative Pretraining from Pixels
| Mark ChenAlec RadfordRewon ChildJeff WuHeewoo JunPrafulla DhariwalDavid LuanIlya Sutskever
2020-07-17
Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR
Balázs TarjánGyörgy SzaszákTibor FegyóPéter Mihajlik
2020-07-14
The Go Transformer: Natural Language Modeling for Game Play
Matthew CiolinoDavid NoeverJosh Kalin
2020-07-07
You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion
Roei SchusterCongzheng SongEran TromerVitaly Shmatikov
2020-07-05
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation
Bo PangErik NijkampWenjuan HanLinqi ZhouYixian LiuKewei Tu
2020-07-01
Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jae-young JoSung-Hyon Myaeng
2020-07-01
On-The-Fly Information Retrieval Augmentation for Language Models
Hai WangDavid McAllester
2020-07-01
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity
Jordan J. BirdDiego R. FariaAnikó EkártCristiano PremebidaPedro P. S. Ayrosa
2020-07-01
The Summary Loop: Learning to Write Abstractive Summaries Without Examples
| Philippe LabanAndrew HsiJohn CannyMarti A. Hearst
2020-07-01
Knowledge-Aware Language Model Pretraining
Corby RossetChenyan XiongMinh PhanXia SongPaul BennettSaurabh Tiwary
2020-06-29
Progressive Generation of Long Text
| Bowen TanZichao YangMaruan AI-ShedivatEric P. XingZhiting Hu
2020-06-28
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung LeSteven C. H. Hoi
2020-06-27
A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19
| David OnianiYanshan Wang
2020-06-19
Unsupervised Paraphrase Generation using Pre-trained Language Models
Chaitra HegdeShrikumar Patil
2020-06-09
Few-Shot Generative Conversational Query Rewriting
| Shi YuJiahua LiuJingqin YangChenyan XiongPaul BennettJianfeng GaoZhiyuan Liu
2020-06-09
Automatic Text Summarization of COVID-19 Medical Research Articles using BERT and GPT-2
| Virapat KieuvongngamBowen TanYiming Niu
2020-06-03
Emergence of Separable Manifolds in Deep Language Representations
Jonathan MamouHang LeMiguel Del RioCory StephensonHanlin TangYoon KimSueYeon Chung
2020-06-01
First Neural Conjecturing Datasets and Experiments
Josef UrbanJan Jakubův
2020-05-29
Creative Artificial Intelligence -- Algorithms vs. humans in an incentivized writing competition
Nils KöbisLuca Mossink
2020-05-20
Large Scale Multi-Actor Generative Dialog Modeling
Alex BoydRaul PuriMohammad ShoeybiMostofa PatwaryBryan Catanzaro
2020-05-13
Distributional Discrepancy: A Metric for Unconditional Text Generation
| Ping CaiXingyuan ChenPeng JinHongjun WangTianrui Li
2020-05-04
Transformer-based End-to-End Question Generation
| Luis Enrico LopezDiane Kathryn CruzJan Christian Blaise CruzCharibeth Cheng
2020-05-03
A Simple Language Model for Task-Oriented Dialogue
| Ehsan Hosseini-AslBryan McCannChien-Sheng WuSemih YavuzRichard Socher
2020-05-02
A Controllable Model of Grounded Response Generation
Zeqiu WuMichel GalleyChris BrockettYizhe ZhangXiang GaoChris QuirkRik Koncel-KedziorskiJianfeng GaoHannaneh HajishirziMari OstendorfBill Dolan
2020-05-01
POINTER: Constrained Text Generation via Insertion-based Generative Pre-training
| Yizhe ZhangGuoyin WangChunyuan LiZhe GanChris BrockettBill Dolan
2020-05-01
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul MoeedYang AnGerhard HagererGeorg Groh
2020-05-01
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking
Hannah RashkinAsli CelikyilmazYejin ChoiJianfeng Gao
2020-04-30
GePpeTto Carves Italian into a Language Model
| Lorenzo De MatteiMichele CafagnaFelice Dell'OrlettaMalvina NissimMarco Guerini
2020-04-29
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Kaitao SongHao SunXu TanTao QinJianfeng LuHongzhi LiuTie-Yan Liu
2020-04-27
Assessing Discourse Relations in Language Generation from Pre-trained Language Models
Wei-Jen KoJunyi Jessy Li
2020-04-26
A Tailored Pre-Training Model for Task-Oriented Dialog Generation
Jing GuQingyang WuChongruo WuWeiyan ShiZhou Yu
2020-04-24
Mirror Ritual: An Affective Interface for Emotional Self-Reflection
Nina RajcicJon McCormack
2020-04-21
StereoSet: Measuring stereotypical bias in pretrained language models
| Moin NadeemAnna BethkeSiva Reddy
2020-04-20
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Serra Sinem TekirogluYi-Ling ChungMarco Guerini
2020-04-08
Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
Hamza HarkousIsabel GrovesAmir Saffari
2020-04-08
TextGAIL: Generative Adversarial Imitation Learning for Text Generation
Qingyang WuLei LiZhou Yu
2020-04-07
DARE: Data Augmented Relation Extraction with GPT-2
Yannis PapanikolaouAndrea Pierleoni
2020-04-06
Sparse Text Generation
Pedro Henrique MartinsZita MarinhoAndré F. T. Martins
2020-04-06
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
| Chunyuan LiXiang GaoYuan LiXiujun LiBaolin PengYizhe ZhangJianfeng Gao
2020-04-05
Generating Rationales in Visual Question Answering
Hammad A. AyyubiMd. Mehrab TanjimJulian J. McAuleyGarrison W. Cottrell
2020-04-04
Generating Major Types of Chinese Classical Poetry in a Uniformed Framework
Jinyi HuMaosong Sun
2020-03-13
RecipeGPT: Generative Pre-training Based Cooking Recipe Generation and Evaluation System
| Helena H. LeeKe ShuPalakorn AchananuparpPhilips Kokoh PrasetyoYue LiuEe-Peng LimLav R. Varshney
2020-03-05
Hybrid Generative-Retrieval Transformers for Dialogue Domain Adaptation
Igor ShalyminovAlessandro SordoniAdam AtkinsonHannes Schulz
2020-03-03
Training Question Answering Models From Synthetic Data
Raul PuriRyan SpringMostofa PatwaryMohammad ShoeybiBryan Catanzaro
2020-02-22
Transformer on a Diet
| Chenguang WangZihao YeAston ZhangZheng ZhangAlexander J. Smola
2020-02-14
CBAG: Conditional Biomedical Abstract Generation
Justin SybrandtIlya Safro
2020-02-13
Introducing Aspects of Creativity in Automatic Poetry Generation
Brendan BenaJugal Kalita
2020-02-06
Fine-Tuning a Transformer-Based Language Model to Avoid Generating Non-Normative Text
Xiangyu PengSiyan LiSpencer FrazierMark Riedl
2020-01-23
PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata
Jieh-Sheng LeeJieh Hsiang
2020-01-11
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation
Anonymous
2020-01-01
Alternating Recurrent Dialog Model with Large-Scale Pre-Trained Language Models
Anonymous
2020-01-01
Personalized Patent Claim Generation and Measurement
Jieh-Sheng Lee
2019-12-07
Paraphrasing with Large Language Models
Sam WitteveenMartin Andrews
2019-11-21
Unsupervised Natural Question Answering with a Small Model
Martin AndrewsSam Witteveen
2019-11-19
Attending to Entities for Better Text Understanding
Pengxiang ChengKatrin Erk
2019-11-11
INSET: Sentence Infilling with INter-SEntential Transformer
Yichen HuangYizhe ZhangOussama ElachqarYu Cheng
2019-11-10
Grounded Conversation Generation as Guided Traverses in Commonsense Knowledge Graphs
| Houyu ZhangZhenghao LiuChenyan XiongZhiyuan Liu
2019-11-07
Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
Tassilo KleinMoin Nabi
2019-11-06
Assessing Social and Intersectional Biases in Contextualized Word Representations
Yi Chern TanL. Elisa Celis
2019-11-04
Selecting, Planning, and Rewriting: A Modular Approach for Data-to-Document Generation and Translation
Lesly MiculicichMarc MaroneHany Hassan
2019-11-01
GEM: Generative Enhanced Model for adversarial attacks
Piotr NiewinskiMaria PszonaMaria Janicka
2019-11-01
Masked Language Model Scoring
| Julian SalazarDavis LiangToan Q. NguyenKatrin Kirchhoff
2019-10-31
Multilingual Question Answering from Formatted Text applied to Conversational Agents
| Wissam SibliniCharlotte PasqualAxel LavielleCyril Cauchois
2019-10-10
Alternating Roles Dialog Model with Large-scale Pre-trained Language Models
| Qingyang WuYichi ZhangYu LiZhou Yu
2019-10-09
Towards Understanding of Medical Randomized Controlled Trials by Conclusion Generation
Alexander Te-Wei ShiehYung-Sung ChuangShang-Yu SuYun-Nung Chen
2019-10-03
TMLab: Generative Enhanced Model (GEM) for adversarial attacks
Piotr NiewinskiMaria PszonaMaria Janicka
2019-10-01
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
| Mohammad ShoeybiMostofa PatwaryRaul PuriPatrick LeGresleyJared CasperBryan Catanzaro
2019-09-17
How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
Kawin Ethayarajh
2019-09-02
Neural Language Model for Automated Classification of Electronic Medical Records at the Emergency Room. The Significant Benefit of Unsupervised Generative Pre-training
Binbin XuCédric Gil-JardinéFrantz ThiessardEric TellierMarta AvalosEmmanuel Lagarde
2019-08-30
Measuring Patent Claim Generation by Span Relevancy
Jieh-Sheng LeeJieh Hsiang
2019-08-26
Release Strategies and the Social Impacts of Language Models
Irene SolaimanMiles BrundageJack ClarkAmanda AskellAriel Herbert-VossJeff WuAlec RadfordGretchen KruegerJong Wook KimSarah KrepsMiles McCainAlex NewhouseJason BlazakisKris McGuffieJasmine Wang
2019-08-24
Universal Adversarial Triggers for Attacking and Analyzing NLP
| Eric WallaceShi FengNikhil KandpalMatt GardnerSameer Singh
2019-08-20
Noisy Channel for Low Resource Grammatical Error Correction
Simon FlachsOph{\'e}lie LacroixAnders S{\o}gaard
2019-08-01
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
| Sascha RotheShashi NarayanAliaksei Severyn
2019-07-29
DLGNet: A Transformer-based Model for Dialogue Response Generation
Oluwatobi OlabiyiErik T. Mueller
2019-07-26
Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection
David Ifeoluwa AdelaniHaotian MaiFuming FangHuy H. NguyenJunichi YamagishiIsao Echizen
2019-07-22
Patent Claim Generation by Fine-Tuning OpenAI GPT-2
Jieh-Sheng LeeJieh Hsiang
2019-07-01
One Epoch Is All You Need
Aran Komatsuzaki
2019-06-16
A Multiscale Visualization of Attention in the Transformer Model
| Jesse Vig
2019-06-12
Analyzing the Structure of Attention in a Transformer Language Model
Jesse VigYonatan Belinkov
2019-06-07
Visualizing Attention in Transformer-Based Language Representation Models
Jesse Vig
2019-04-04
Language Models are Unsupervised Multitask Learners
| Alec RadfordJeffrey WuRewon ChildDavid LuanDario AmodeiIlya Sutskever
2019-02-14

Categories