We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over \nummodels language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, \chinchilla, that uses the same compute budget as \gopher but with 70B parameters and 4$\times$ more more data. \chinchilla uniformly and significantly outperforms \Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that \chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, \chinchilla reaches a state-of-the-art average accuracy of 67.5\% on the MMLU benchmark, greater than a 7\% improvement over \gopher.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
General Knowledge BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94.3 # 1
GRE Reading Comprehension BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 53.1 # 1
Figure Of Speech Detection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 63.3 # 1
Fantasy Reasoning BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 69 # 1
English Proverbs BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 82.4 # 1
Human Organs Senses Multiple Choice BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 85.7 # 1
Mathematical Induction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 47.3 # 2
Presuppositions As NLI BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 49.9 # 1
Physical Intuition BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 79 # 1
Metaphor Boolean BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 93.1 # 1
Logical Args BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 56.2 # 2
Evaluating Information Essentiality BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 17.6 # 1
Epistemic Reasoning BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 60.6 # 1
Entailed Polarity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94 # 1
Analytic Entailment BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 67.1 # 1
Similarities Abstraction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 87 # 1
Sentence Ambiguity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 71.7 # 1
Misconceptions BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.3 # 1
Moral Permissibility BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 57.3 # 1
Dark Humor Detection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 66.2 # 2
Understanding Fables BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 60.3 # 1
Timedial BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 68.8 # 1
Riddle Sense BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 85.7 # 1
Irony Identification BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 73.0 # 1
Empirical Judgments BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 67.7 # 1
Discourse Marker Prediction BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 13.1 # 1
Crass AI BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 75.0 # 3
Crash Blossom BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 47.6 # 2
Odd One Out BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 70.9 # 1
Analogical Similarity BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 38.1 # 1
Identify Odd Metapor BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 68.8 # 1
Physics MC BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 65.5 # 1
Question Selection BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 52.6 # 1
Phrase Relatedness BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 94 # 1
Nonsense Words Grammar BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 78 # 1
Movie Dialog Same Or Different BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 54.5 # 1
LAMBADA BIG-bench Chinchilla-70B (zero-shot) Accuracy 77.4 # 1
Intent Recognition BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 92.8 # 1
Implicit Relations BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 49.4 # 1
Implicatures BIG-bench Chinchilla-70B (few-shot, k=5) Accuracy 75 # 1
Word Sense Disambiguation BIG-bench (Anachronisms) Chinchilla-70B (few-shot, k=5) Accuracy 69.1 # 1
Common Sense Reasoning BIG-bench (Causal Judgment) Chinchilla-70B (few-shot, k=5) Accuracy 57.4 # 4
Common Sense Reasoning BIG-bench (Date Understanding) Chinchilla-70B (few-shot, k=5) Accuracy 52.3 # 5
Common Sense Reasoning BIG-bench (Disambiguation QA) Chinchilla-70B (few-shot, k=5) Accuracy 54.7 # 4
Logical Reasoning BIG-bench (Formal Fallacies Syllogisms Negation) Chinchilla-70B (few-shot, k=5) Accuracy 52.1 # 7
Multiple Choice Question Answering (MCQA) BIG-bench (Hyperbaton) Chinchilla-70B (few-shot, k=5) Accuracy 54.2 # 8
Common Sense Reasoning BIG-bench (Known Unknowns) Chinchilla-70B (few-shot, k=5) Accuracy 65.2 # 2
Logical Reasoning BIG-bench (Logical Fallacy Detection) Chinchilla-70B (few-shot, k=5) Accuracy 72.1 # 1
Common Sense Reasoning BIG-bench (Logical Sequence) Chinchilla-70B (few-shot, k=5) Accuracy 64.1 # 1
Logical Reasoning BIG-bench (Logic Grid Puzzle) Chinchilla-70B (few-shot, k=5) Accuracy 44 # 1
Multiple Choice Question Answering (MCQA) BIG-bench (Movie Recommendation) Chinchilla-70B (few-shot, k=5) Accuracy 75.6 # 8
Multiple Choice Question Answering (MCQA) BIG-bench (Navigate) Chinchilla-70B (few-shot, k=5) Accuracy 52.6 # 4
Multiple Choice Question Answering (MCQA) BIG-bench (Novel Concepts) Chinchilla-70B (few-shot, k=5) Accuracy 65.6 # 2
Logical Reasoning BIG-bench (Penguins In A Table) Chinchilla-70B (few-shot, k=5) Accuracy 48.7 # 3
Logical Reasoning BIG-bench (Reasoning About Colored Objects) Chinchilla-70B (few-shot, k=5) Accuracy 59.7 # 3
Multiple Choice Question Answering (MCQA) BIG-bench (Ruin Names) Chinchilla-70B (few-shot, k=5) Accuracy 47.1 # 8
Sarcasm Detection BIG-bench (SNARKS) Chinchilla-70B (few-shot, k=5) Accuracy 58.6 # 7
Common Sense Reasoning BIG-bench (Sports Understanding) Chinchilla-70B (few-shot, k=5) Accuracy 71 # 4
Logical Reasoning BIG-bench (StrategyQA) Chinchilla-70B (few-shot, k=5) Accuracy 68.3 # 2
Logical Reasoning BIG-bench (Temporal Sequences) Chinchilla-70B (few-shot, k=5) Accuracy 32.0 # 5
Common Sense Reasoning BIG-bench (Winowhy) Chinchilla-70B (few-shot, k=5) Accuracy 62.5 # 2
Question Answering BoolQ Chinchilla 70B (0-shot) Accuracy 83.7 # 20
Sentence Completion HellaSwag Chinchilla 70B (0-shot) Accuracy 80.8 # 38
Language Modelling LAMBADA Chinchilla (Zero-Shot) Accuracy 77.7 # 16
Multi-task Language Understanding MMLU Chinchilla 70B (5-shot) Average (%) 67.5 # 38
Mathematical Reasoning MMLU (Mathematics) Chinchilla (5-shot) Accuracy 35.7 # 4
Question Answering Natural Questions Chinchilla (few-shot, k=64) EM 35.5 # 21
Question Answering PIQA Chinchilla 70B (0-shot) Accuracy 81.8 # 20
Question Answering SIQA Chinchilla (zero-shot) Accuracy 51.3 # 15
Common Sense Reasoning WinoGrande Chinchilla 70B (0-shot) Accuracy 74.9 # 24

Methods