GPT-3

Introduced by Brown et al. in Language Models are Few-Shot Learners

GPT-3 is an autoregressive transformer model with 175 billion parameters. It uses the same architecture/model as GPT-2, including the modified initialization, pre-normalization, and reversible tokenization, with the exception that GPT-3 uses alternating dense and locally banded sparse attention patterns in the layers of the transformer, similar to the Sparse Transformer.

Source: Language Models are Few-Shot Learners

Latest Papers

PAPER DATE
Persistent Anti-Muslim Bias in Large Language Models
Abubakar AbidMaheen FarooqiJames Zou
2021-01-14
How Multipurpose Are Language Models?
Anonymous
2021-01-01
Making Pre-trained Language Models Better Few-shot Learners
| Tianyu GaoAdam FischDanqi Chen
2020-12-31
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
| Leo GaoStella BidermanSid BlackLaurence GoldingTravis HoppeCharles FosterJason PhangHorace HeAnish ThiteNoa NabeshimaShawn PresserConnor Leahy
2020-12-31
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
Julien LaunayIacopo PoliKilian MüllerGustave ParienteIgor CarronLaurent DaudetFlorent KrzakalaSylvain Gigan
2020-12-11
CPM: A Large-scale Generative Chinese Pre-trained Language Model
| Zhengyan ZhangXu HanHao ZhouPei KeYuxian GuDeming YeYujia QinYusheng SuHaozhe JiJian GuanFanchao QiXiaozhi WangYanan ZhengGuoyang ZengHuanqi CaoShengqi ChenDaixuan LiZhenbo SunZhiyuan LiuMinlie HuangWentao HanJie TangJuanzi LiXiaoyan ZhuMaosong Sun
2020-12-01
Do Fine-tuned Commonsense Language Models Really Generalize?
Mayank KejriwalKe Shen
2020-11-18
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. HwangChandra BhagavatulaRonan Le BrasJeff DaKeisuke SakaguchiAntoine BosselutYejin Choi
2020-10-12
Toward a Thermodynamics of Meaning
| Jonathan Scott Enderle
2020-09-24
It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
| Timo SchickHinrich Schütze
2020-09-15
The Radicalization Risks of GPT-3 and Advanced Neural Language Models
Kris McGuffieAlex Newhouse
2020-09-15
Unit Test Case Generation with Transformers
| Michele TufanoDawn DrainAlexey SvyatkovskiyShao Kun DengNeel Sundaresan
2020-09-11
Measuring Massive Multitask Language Understanding
| Dan HendrycksCollin BurnsSteven BasartAndy ZouMantas MazeikaDawn SongJacob Steinhardt
2020-09-07
Discrete Word Embedding for Logical Natural Language Understanding
Masataro AsaiZilu Tang
2020-08-26
Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems
Andrea MadottoZihan LiuZhaojiang LinPascale Fung
2020-08-14
Language Models are Few-Shot Learners
| Tom B. BrownBenjamin MannNick RyderMelanie SubbiahJared KaplanPrafulla DhariwalArvind NeelakantanPranav ShyamGirish SastryAmanda AskellSandhini AgarwalAriel Herbert-VossGretchen KruegerTom HenighanRewon ChildAditya RameshDaniel M. ZieglerJeffrey WuClemens WinterChristopher HesseMark ChenEric SiglerMateusz LitwinScott GrayBenjamin ChessJack ClarkChristopher BernerSam McCandlishAlec RadfordIlya SutskeverDario Amodei
2020-05-28

Categories