Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Common Sense Reasoning ARC (Challenge) GPT-3 175B (Few-Shot) Accuracy 51.5 # 1
Common Sense Reasoning ARC (Easy) GPT-3 175B (Few-Shot) Accuracy 70.1 # 1
Question Answering BoolQ GPT-3 175B (Few-Shot) Accuracy 76.4 # 2
Natural Language Inference CommitmentBank GPT-3 175B (Few-Shot) F1 52 # 2
Accuracy 75.6 # 1
Question Answering COPA GPT-3 175B (Few-Shot) Accuracy 92 # 2
Question Answering CoQA GPT-3 175B (Few-Shot) Overall 85 # 1
Question Answering DROP Test GPT-3 175B (Few-Shot) F1 36.5 # 1
Sentence Completion HellaSwag GPT-3 175B (Few-Shot) Accuracy 79.3 # 1
Multi-Task Learning Hendrycks Test GPT-3 (few-shot) Accuracy (%) 43.9 # 2
Language Modelling LAMBADA GPT-3 175B (Few-Shot) Accuracy 86.4 # 1
Perplexity 1.92 # 1
Question Answering MultiRC GPT-3 175B (Few-Shot) F1a 75.4 # 2
Question Answering Natural Questions GPT-3 175B (Few-Shot) Accuracy 29.9 # 1
Common Sense Reasoning OpenBookQA GPT-3 175B (Few-Shot) Accuracy 65.4 # 1
Language Modelling Penn Treebank (Word Level) GPT-3 (Zero-Shot) Test perplexity 20.5 # 1
Params 175000M # 1
Common Sense Reasoning PIQA GPT-3 175B (Few-Shot) Accuracy 82.8 # 1
Question Answering QuAC GPT-3 175B (Few-Shot) F1 44.3 # 2
Question Answering RACE GPT-3 175B (Few-Shot) RACE-m 58.1 # 3
RACE-h 46.8 # 3
Question Answering ReCoRD GPT-3 175B (Few-Shot) F1 91.1 # 3
Accuracy 90.2 # 1
Natural Language Inference RTE GPT-3 175B (Few-Shot) Accuracy 69% # 15
Question Answering SQuAD2.0 GPT-3 175B (Few-Shot) F1 69.8 # 195
Question Answering Story Cloze Test GPT-3 175B (Few-Shot) Accuracy 87.7 # 1
Question Answering TriviaQA GPT-3 175B (Few-Shot) Accuracy 71.2 # 1
Question Answering WebQuestions GPT-3 175B (Few-Shot) Accuracy 41.5 # 1
Coreference Resolution Winograd Schema Challenge GPT-3 175B (Few-Shot) Accuracy 80.1 # 1
Unsupervised Machine Translation WMT2014 English-French GPT-3 175B (Few-Shot) BLEU 32.6 # 4
Unsupervised Machine Translation WMT2014 French-English GPT-3 175B (Few-Shot) BLEU 39.2 # 1
Unsupervised Machine Translation WMT2016 English-German GPT-3 175B (Few-Shot) BLEU 29.7 # 1
Unsupervised Machine Translation WMT2016 English-Romanian GPT-3 175B (Few-Shot) BLEU 21 # 1
Unsupervised Machine Translation WMT2016 German-English GPT-3 175B (Few-Shot) BLEU 40.6 # 1
Unsupervised Machine Translation WMT2016 Romanian-English GPT-3 175B (Few-Shot) BLEU 39.5 # 1
Word Sense Disambiguation Words in Context GPT-3 175B (Few-Shot) Accuracy 49.4 # 2
Coreference Resolution WSC GPT-3 175B (Few-Shot) Accuracy 80.1 # 1

Methods used in the Paper