TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
General Knowledge	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94.3	# 1
GRE Reading Comprehension	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	53.1	# 1
Figure Of Speech Detection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	63.3	# 1
Fantasy Reasoning	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	69	# 1
English Proverbs	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	82.4	# 1
Human Organs Senses Multiple Choice	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	# 1
Mathematical Induction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	47.3	# 2
Presuppositions As NLI	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	49.9	# 1
Physical Intuition	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	79	# 1
Metaphor Boolean	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	93.1	# 1
Logical Args	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	56.2	# 2
Evaluating Information Essentiality	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	17.6	# 1
Epistemic Reasoning	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	60.6	# 1
Entailed Polarity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94	# 1
Analytic Entailment	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	67.1	# 1
Similarities Abstraction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	87	# 1
Sentence Ambiguity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	71.7	# 1
Misconceptions	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	65.3	# 1
Moral Permissibility	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	57.3	# 1
Dark Humor Detection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	66.2	# 2
Understanding Fables	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	60.3	# 1
Timedial	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	68.8	# 1
Riddle Sense	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	# 1
Irony Identification	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	73.0	# 1
Empirical Judgments	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	67.7	# 1
Discourse Marker Prediction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	13.1	# 1
Crass AI	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	75.0	# 3
Crash Blossom	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	47.6	# 2
Odd One Out	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	70.9	# 1
Analogical Similarity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	38.1	# 1
Identify Odd Metapor	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	68.8	# 1
Physics MC	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	65.5	# 1
Question Selection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	52.6	# 1
Phrase Relatedness	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94	# 1
Nonsense Words Grammar	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	78	# 1
Movie Dialog Same Or Different	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	54.5	# 1
LAMBADA	BIG-bench	Chinchilla-70B (zero-shot)	Accuracy	77.4	# 1
Intent Recognition	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	92.8	# 1
Implicit Relations	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	49.4	# 1
Implicatures	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	75	# 1
Word Sense Disambiguation	BIG-bench (Anachronisms)	Chinchilla-70B (few-shot, k=5)	Accuracy	69.1	# 1
Common Sense Reasoning	BIG-bench (Causal Judgment)	Chinchilla-70B (few-shot, k=5)	Accuracy	57.4	# 4
Common Sense Reasoning	BIG-bench (Date Understanding)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.3	# 5
Common Sense Reasoning	BIG-bench (Disambiguation QA)	Chinchilla-70B (few-shot, k=5)	Accuracy	54.7	# 4
Logical Reasoning	BIG-bench (Formal Fallacies Syllogisms Negation)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	# 7
Multiple Choice Question Answering (MCQA)	BIG-bench (Hyperbaton)	Chinchilla-70B (few-shot, k=5)	Accuracy	54.2	# 8
Common Sense Reasoning	BIG-bench (Known Unknowns)	Chinchilla-70B (few-shot, k=5)	Accuracy	65.2	# 2
Logical Reasoning	BIG-bench (Logical Fallacy Detection)	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	# 1
Common Sense Reasoning	BIG-bench (Logical Sequence)	Chinchilla-70B (few-shot, k=5)	Accuracy	64.1	# 1
Logical Reasoning	BIG-bench (Logic Grid Puzzle)	Chinchilla-70B (few-shot, k=5)	Accuracy	44	# 1
Multiple Choice Question Answering (MCQA)	BIG-bench (Movie Recommendation)	Chinchilla-70B (few-shot, k=5)	Accuracy	75.6	# 8
Multiple Choice Question Answering (MCQA)	BIG-bench (Navigate)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.6	# 4
Multiple Choice Question Answering (MCQA)	BIG-bench (Novel Concepts)	Chinchilla-70B (few-shot, k=5)	Accuracy	65.6	# 2
Logical Reasoning	BIG-bench (Penguins In A Table)	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	# 3
Logical Reasoning	BIG-bench (Reasoning About Colored Objects)	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	# 3
Multiple Choice Question Answering (MCQA)	BIG-bench (Ruin Names)	Chinchilla-70B (few-shot, k=5)	Accuracy	47.1	# 8
Sarcasm Detection	BIG-bench (SNARKS)	Chinchilla-70B (few-shot, k=5)	Accuracy	58.6	# 7
Common Sense Reasoning	BIG-bench (Sports Understanding)	Chinchilla-70B (few-shot, k=5)	Accuracy	71	# 4
Logical Reasoning	BIG-bench (StrategyQA)	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	# 2
Logical Reasoning	BIG-bench (Temporal Sequences)	Chinchilla-70B (few-shot, k=5)	Accuracy	32.0	# 5
Common Sense Reasoning	BIG-bench (Winowhy)	Chinchilla-70B (few-shot, k=5)	Accuracy	62.5	# 2
Question Answering	BoolQ	Chinchilla 70B (0-shot)	Accuracy	83.7	# 20
Sentence Completion	HellaSwag	Chinchilla 70B (0-shot)	Accuracy	80.8	# 38
Language Modelling	LAMBADA	Chinchilla (Zero-Shot)	Accuracy	77.7	# 16
Multi-task Language Understanding	MMLU	Chinchilla 70B (5-shot)	Average (%)	67.5	# 38
Mathematical Reasoning	MMLU (Mathematics)	Chinchilla (5-shot)	Accuracy	35.7	# 4
Question Answering	Natural Questions	Chinchilla (few-shot, k=64)	EM	35.5	# 21
Question Answering	PIQA	Chinchilla 70B (0-shot)	Accuracy	81.8	# 20
Question Answering	SIQA	Chinchilla (zero-shot)	Accuracy	51.3	# 15
Common Sense Reasoning	WinoGrande	Chinchilla 70B (0-shot)	Accuracy	74.9	# 24

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/general-knowledge-on-big-bench)](https://paperswithcode.com/sota/general-knowledge-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/gre-reading-comprehension-on-big-bench)](https://paperswithcode.com/sota/gre-reading-comprehension-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/figure-of-speech-detection-on-big-bench)](https://paperswithcode.com/sota/figure-of-speech-detection-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/fantasy-reasoning-on-big-bench)](https://paperswithcode.com/sota/fantasy-reasoning-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/english-proverbs-on-big-bench)](https://paperswithcode.com/sota/english-proverbs-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/human-organs-senses-multiple-choice-on-big)](https://paperswithcode.com/sota/human-organs-senses-multiple-choice-on-big?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/presuppositions-as-nli-on-big-bench)](https://paperswithcode.com/sota/presuppositions-as-nli-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/physical-intuition-on-big-bench)](https://paperswithcode.com/sota/physical-intuition-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/metaphor-boolean-on-big-bench)](https://paperswithcode.com/sota/metaphor-boolean-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/evaluating-information-essentiality-on-big)](https://paperswithcode.com/sota/evaluating-information-essentiality-on-big?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/epistemic-reasoning-on-big-bench)](https://paperswithcode.com/sota/epistemic-reasoning-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/entailed-polarity-on-big-bench)](https://paperswithcode.com/sota/entailed-polarity-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/analytic-entailment-on-big-bench)](https://paperswithcode.com/sota/analytic-entailment-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/similarities-abstraction-on-big-bench)](https://paperswithcode.com/sota/similarities-abstraction-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/sentence-ambiguity-on-big-bench)](https://paperswithcode.com/sota/sentence-ambiguity-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/misconceptions-on-big-bench)](https://paperswithcode.com/sota/misconceptions-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/moral-permissibility-on-big-bench)](https://paperswithcode.com/sota/moral-permissibility-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/understanding-fables-on-big-bench)](https://paperswithcode.com/sota/understanding-fables-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/timedial-on-big-bench)](https://paperswithcode.com/sota/timedial-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/riddle-sense-on-big-bench)](https://paperswithcode.com/sota/riddle-sense-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/irony-identification-on-big-bench)](https://paperswithcode.com/sota/irony-identification-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/empirical-judgments-on-big-bench)](https://paperswithcode.com/sota/empirical-judgments-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/discourse-marker-prediction-on-big-bench)](https://paperswithcode.com/sota/discourse-marker-prediction-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/odd-one-out-on-big-bench)](https://paperswithcode.com/sota/odd-one-out-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/analogical-similarity-on-big-bench)](https://paperswithcode.com/sota/analogical-similarity-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/identify-odd-metapor-on-big-bench)](https://paperswithcode.com/sota/identify-odd-metapor-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/physics-mc-on-big-bench)](https://paperswithcode.com/sota/physics-mc-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/question-selection-on-big-bench)](https://paperswithcode.com/sota/question-selection-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/phrase-relatedness-on-big-bench)](https://paperswithcode.com/sota/phrase-relatedness-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/nonsense-words-grammar-on-big-bench)](https://paperswithcode.com/sota/nonsense-words-grammar-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/movie-dialog-same-or-different-on-big-bench)](https://paperswithcode.com/sota/movie-dialog-same-or-different-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/lambada-on-big-bench)](https://paperswithcode.com/sota/lambada-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/intent-recognition-on-big-bench)](https://paperswithcode.com/sota/intent-recognition-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/implicit-relations-on-big-bench)](https://paperswithcode.com/sota/implicit-relations-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/implicatures-on-big-bench)](https://paperswithcode.com/sota/implicatures-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/word-sense-disambiguation-on-big-bench)](https://paperswithcode.com/sota/word-sense-disambiguation-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-logical)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-logical?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-logical)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-logical?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-logic-grid)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-logic-grid?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/mathematical-induction-on-big-bench)](https://paperswithcode.com/sota/mathematical-induction-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-args-on-big-bench)](https://paperswithcode.com/sota/logical-args-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/dark-humor-detection-on-big-bench)](https://paperswithcode.com/sota/dark-humor-detection-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/crash-blossom-on-big-bench)](https://paperswithcode.com/sota/crash-blossom-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-known)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-known?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multiple-choice-question-answering-mcqa-on-31)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-31?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-strategyqa)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-strategyqa?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-winowhy)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-winowhy?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/crass-ai-on-big-bench)](https://paperswithcode.com/sota/crass-ai-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-penguins-in-a)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-penguins-in-a?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-reasoning)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-reasoning?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-causal)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-causal?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multiple-choice-question-answering-mcqa-on-29)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-29?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-sports)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-sports?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/mathematical-reasoning-on-mmlu-mathematics)](https://paperswithcode.com/sota/mathematical-reasoning-on-mmlu-mathematics?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-big-bench-date)](https://paperswithcode.com/sota/common-sense-reasoning-on-big-bench-date?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-temporal)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-temporal?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/logical-reasoning-on-big-bench-formal)](https://paperswithcode.com/sota/logical-reasoning-on-big-bench-formal?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/sarcasm-detection-on-big-bench-snarks)](https://paperswithcode.com/sota/sarcasm-detection-on-big-bench-snarks?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multiple-choice-question-answering-mcqa-on-27)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-27?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multiple-choice-question-answering-mcqa-on-28)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-28?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multiple-choice-question-answering-mcqa-on-30)](https://paperswithcode.com/sota/multiple-choice-question-answering-mcqa-on-30?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/language-modelling-on-lambada)](https://paperswithcode.com/sota/language-modelling-on-lambada?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/question-answering-on-boolq)](https://paperswithcode.com/sota/question-answering-on-boolq?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/question-answering-on-natural-questions)](https://paperswithcode.com/sota/question-answering-on-natural-questions?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=training-compute-optimal-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/training-compute-optimal-large-language/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=training-compute-optimal-large-language)`

Training Compute-Optimal Large Language Models

29 Mar 2022 · Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent SIfre ·

We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over \nummodels language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, \chinchilla, that uses the same compute budget as \gopher but with 70B parameters and 4$\times$ more more data. \chinchilla uniformly and significantly outperforms \Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that \chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, \chinchilla reaches a state-of-the-art average accuracy of 67.5\% on the MMLU benchmark, greater than a 7\% improvement over \gopher.

PDF Abstract

Code

Add Remove Mark official

karpathy/llama2.c

↳ Quickstart in

Colab

15,894

nkluge-correa/teenytinyllama

Tasks

Add Remove

Anachronisms

Analogical Similarity

Analytic Entailment

Causal Judgment

Common Sense Reasoning

Crash Blossom

Crass AI

Dark Humor Detection

Date Understanding

Disambiguation QA

Discourse Marker Prediction

Empirical Judgments

English Proverbs

Entailed Polarity

Epistemic Reasoning

Evaluating Information Essentiality

Fantasy Reasoning

Figure Of Speech Detection

Formal Fallacies Syllogisms Negation

General Knowledge

GRE Reading Comprehension

Human Organs Senses Multiple Choice

Hyperbaton

Identify Odd Metapor

Implicatures

Implicit Relations

Intent Recognition

Irony Identification

Known Unknowns

LAMBADA

Language Modelling

Logical Args

Logical Fallacy Detection

Logical Reasoning

Logical Sequence

Logic Grid Puzzle

Mathematical Induction

Mathematical Reasoning

Metaphor Boolean

Misconceptions

Moral Permissibility

Movie Dialog Same Or Different

Movie Recommendation

Multiple Choice Question Answering (MCQA)

Multi-task Language Understanding

Navigate

Nonsense Words Grammar

Novel Concepts

Odd One Out

Penguins In A Table

Phrase Relatedness

Physical Intuition

Physics MC

Presuppositions As NLI

Question Answering

Question Selection

Reasoning About Colored Objects

Riddle Sense

Ruin Names

Sarcasm Detection

Sentence Ambiguity

Sentence Completion

Similarities Abstraction

SNARKS

Sports Understanding

StrategyQA

Temporal Sequences

Timedial

Understanding Fables

Winowhy

Word Sense Disambiguation

Datasets

Natural Questions

WikiText-2

MMLU

TriviaQA

WikiText-103

HellaSwag

BoolQ

PIQA

WinoGrande

The Pile

TruthfulQA

BIG-bench

LAMBADA

SIQA

Results from the Paper

Add Remove

Ranked #1 on Common Sense Reasoning on BIG-bench (Logical Sequence)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
General Knowledge	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94.3	# 1	Compare
GRE Reading Comprehension	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	53.1	# 1	Compare
Figure Of Speech Detection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	63.3	# 1	Compare
Fantasy Reasoning	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	69	# 1	Compare
English Proverbs	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	82.4	# 1	Compare
Human Organs Senses Multiple Choice	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	# 1	Compare
Mathematical Induction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	47.3	# 2	Compare
Presuppositions As NLI	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	49.9	# 1	Compare
Physical Intuition	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	79	# 1	Compare
Metaphor Boolean	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	93.1	# 1	Compare
Logical Args	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	56.2	# 2	Compare
Evaluating Information Essentiality	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	17.6	# 1	Compare
Epistemic Reasoning	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	60.6	# 1	Compare
Entailed Polarity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94	# 1	Compare
Analytic Entailment	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	67.1	# 1	Compare
Similarities Abstraction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	87	# 1	Compare
Sentence Ambiguity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	71.7	# 1	Compare
Misconceptions	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	65.3	# 1	Compare
Moral Permissibility	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	57.3	# 1	Compare
Dark Humor Detection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	66.2	# 2	Compare
Understanding Fables	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	60.3	# 1	Compare
Timedial	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	68.8	# 1	Compare
Riddle Sense	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	85.7	# 1	Compare
Irony Identification	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	73.0	# 1	Compare
Empirical Judgments	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	67.7	# 1	Compare
Discourse Marker Prediction	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	13.1	# 1	Compare
Crass AI	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	75.0	# 3	Compare
Crash Blossom	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	47.6	# 2	Compare
Odd One Out	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	70.9	# 1	Compare
Analogical Similarity	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	38.1	# 1	Compare
Identify Odd Metapor	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	68.8	# 1	Compare
Physics MC	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	65.5	# 1	Compare
Question Selection	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	52.6	# 1	Compare
Phrase Relatedness	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	94	# 1	Compare
Nonsense Words Grammar	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	78	# 1	Compare
Movie Dialog Same Or Different	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	54.5	# 1	Compare
LAMBADA	BIG-bench	Chinchilla-70B (zero-shot)	Accuracy	77.4	# 1	Compare
Intent Recognition	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	92.8	# 1	Compare
Implicit Relations	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	49.4	# 1	Compare
Implicatures	BIG-bench	Chinchilla-70B (few-shot, k=5)	Accuracy	75	# 1	Compare
Word Sense Disambiguation	BIG-bench (Anachronisms)	Chinchilla-70B (few-shot, k=5)	Accuracy	69.1	# 1	Compare
Common Sense Reasoning	BIG-bench (Causal Judgment)	Chinchilla-70B (few-shot, k=5)	Accuracy	57.4	# 4	Compare
Common Sense Reasoning	BIG-bench (Date Understanding)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.3	# 5	Compare
Common Sense Reasoning	BIG-bench (Disambiguation QA)	Chinchilla-70B (few-shot, k=5)	Accuracy	54.7	# 4	Compare
Logical Reasoning	BIG-bench (Formal Fallacies Syllogisms Negation)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.1	# 7	Compare
Multiple Choice Question Answering (MCQA)	BIG-bench (Hyperbaton)	Chinchilla-70B (few-shot, k=5)	Accuracy	54.2	# 8	Compare
Common Sense Reasoning	BIG-bench (Known Unknowns)	Chinchilla-70B (few-shot, k=5)	Accuracy	65.2	# 2	Compare
Logical Reasoning	BIG-bench (Logical Fallacy Detection)	Chinchilla-70B (few-shot, k=5)	Accuracy	72.1	# 1	Compare
Common Sense Reasoning	BIG-bench (Logical Sequence)	Chinchilla-70B (few-shot, k=5)	Accuracy	64.1	# 1	Compare
Logical Reasoning	BIG-bench (Logic Grid Puzzle)	Chinchilla-70B (few-shot, k=5)	Accuracy	44	# 1	Compare
Multiple Choice Question Answering (MCQA)	BIG-bench (Movie Recommendation)	Chinchilla-70B (few-shot, k=5)	Accuracy	75.6	# 8	Compare
Multiple Choice Question Answering (MCQA)	BIG-bench (Navigate)	Chinchilla-70B (few-shot, k=5)	Accuracy	52.6	# 4	Compare
Multiple Choice Question Answering (MCQA)	BIG-bench (Novel Concepts)	Chinchilla-70B (few-shot, k=5)	Accuracy	65.6	# 2	Compare
Logical Reasoning	BIG-bench (Penguins In A Table)	Chinchilla-70B (few-shot, k=5)	Accuracy	48.7	# 3	Compare
Logical Reasoning	BIG-bench (Reasoning About Colored Objects)	Chinchilla-70B (few-shot, k=5)	Accuracy	59.7	# 3	Compare
Multiple Choice Question Answering (MCQA)	BIG-bench (Ruin Names)	Chinchilla-70B (few-shot, k=5)	Accuracy	47.1	# 8	Compare
Sarcasm Detection	BIG-bench (SNARKS)	Chinchilla-70B (few-shot, k=5)	Accuracy	58.6	# 7	Compare
Common Sense Reasoning	BIG-bench (Sports Understanding)	Chinchilla-70B (few-shot, k=5)	Accuracy	71	# 4	Compare
Logical Reasoning	BIG-bench (StrategyQA)	Chinchilla-70B (few-shot, k=5)	Accuracy	68.3	# 2	Compare
Logical Reasoning	BIG-bench (Temporal Sequences)	Chinchilla-70B (few-shot, k=5)	Accuracy	32.0	# 5	Compare
Common Sense Reasoning	BIG-bench (Winowhy)	Chinchilla-70B (few-shot, k=5)	Accuracy	62.5	# 2	Compare
Question Answering	BoolQ	Chinchilla 70B (0-shot)	Accuracy	83.7	# 20	Compare
Sentence Completion	HellaSwag	Chinchilla 70B (0-shot)	Accuracy	80.8	# 38	Compare
Language Modelling	LAMBADA	Chinchilla (Zero-Shot)	Accuracy	77.7	# 16	Compare
Multi-task Language Understanding	MMLU	Chinchilla 70B (5-shot)	Average (%)	67.5	# 38	Compare
Mathematical Reasoning	MMLU (Mathematics)	Chinchilla (5-shot)	Accuracy	35.7	# 4	Compare
Question Answering	Natural Questions	Chinchilla (few-shot, k=64)	EM	35.5	# 21	Compare
Question Answering	PIQA	Chinchilla 70B (0-shot)	Accuracy	81.8	# 20	Compare
Question Answering	SIQA	Chinchilla (zero-shot)	Accuracy	51.3	# 15	Compare
Common Sense Reasoning	WinoGrande	Chinchilla 70B (0-shot)	Accuracy	74.9	# 24	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Chinchilla • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

Training Compute-Optimal Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove