TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	Claude 3 Haiku (0-shot chain-of-thought)	Accuracy	88.9	# 26
Arithmetic Reasoning	GSM8K	Claude 3 Opus (0-shot chain-of-thought)	Accuracy	95	# 10
Arithmetic Reasoning	GSM8K	Claude 3 Sonnet (0-shot chain-of-thought)	Accuracy	92.3	# 19
Code Generation	HumanEval	Claude 3 Sonnet (0-shot)	Pass@1	73	# 19
Code Generation	HumanEval	Claude 3 Opus (0-shot)	Pass@1	84.9	# 9
Code Generation	HumanEval	Claude 3 Haiku (0-shot)	Pass@1	75.9	# 15
Code Generation	MBPP	Claude 3 Opus	Accuracy	86.4	# 4
Code Generation	MBPP	Claude 3 Haiku	Accuracy	80.4	# 9
Code Generation	MBPP	Claude 3 Sonnet	Accuracy	79.4	# 12
Multi-task Language Understanding	MMLU	Claude 3 Haiku (5-shot)	Average (%)	75.2	# 18
Multi-task Language Understanding	MMLU	Claude 3 Haiku (5-shot, CoT)	Average (%)	76.7	# 15
Multi-task Language Understanding	MMLU	Claude 3 Sonnet (5-shot)	Average (%)	79	# 10
Multi-task Language Understanding	MMLU	Claude 3 Sonnet (5-shot, CoT)	Average (%)	81.5	# 7
Multi-task Language Understanding	MMLU	Claude 3 Opus (5-shot)	Average (%)	86.8	# 3
Multi-task Language Understanding	MMLU	Claude 3 Opus (5-shot, CoT)	Average (%)	88.2	# 2
Common Sense Reasoning	WinoGrande	Claude 3 Opus (5-shot)	Accuracy	88.5	# 6
Common Sense Reasoning	WinoGrande	Claude 3 Sonnet (5-shot)	Accuracy	75.1	# 23
Common Sense Reasoning	WinoGrande	Claude 3 Haiku (5-shot)	Accuracy	74.2	# 25

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-claude-3-model-family-opus-sonnet-haiku/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=the-claude-3-model-family-opus-sonnet-haiku)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-claude-3-model-family-opus-sonnet-haiku/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=the-claude-3-model-family-opus-sonnet-haiku)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-claude-3-model-family-opus-sonnet-haiku/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=the-claude-3-model-family-opus-sonnet-haiku)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-claude-3-model-family-opus-sonnet-haiku/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=the-claude-3-model-family-opus-sonnet-haiku)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-claude-3-model-family-opus-sonnet-haiku/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=the-claude-3-model-family-opus-sonnet-haiku)`

The Claude 3 Model Family: Opus, Sonnet, Haiku

Preprint 2024 · Anthropic ·

We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Arithmetic Reasoning

Code Generation

Common Sense Reasoning

Math

Multi-task Language Understanding

Datasets

MMLU

GSM8K

HumanEval

HellaSwag

MATH

RACE

WinoGrande

DROP MBPP

BIG-bench BBH

PubMedQA

APPS

QuALITY BBQ

AI2D MGSM XSTest

Results from the Paper

Add Remove

Ranked #2 on Multi-task Language Understanding on MMLU

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	Claude 3 Haiku (0-shot chain-of-thought)	Accuracy	88.9	# 26	Compare
Arithmetic Reasoning	GSM8K	Claude 3 Opus (0-shot chain-of-thought)	Accuracy	95	# 10	Compare
Arithmetic Reasoning	GSM8K	Claude 3 Sonnet (0-shot chain-of-thought)	Accuracy	92.3	# 19	Compare
Code Generation	HumanEval	Claude 3 Sonnet (0-shot)	Pass@1	73	# 19	Compare
Code Generation	HumanEval	Claude 3 Opus (0-shot)	Pass@1	84.9	# 9	Compare
Code Generation	HumanEval	Claude 3 Haiku (0-shot)	Pass@1	75.9	# 15	Compare
Code Generation	MBPP	Claude 3 Opus	Accuracy	86.4	# 4	Compare
Code Generation	MBPP	Claude 3 Haiku	Accuracy	80.4	# 9	Compare
Code Generation	MBPP	Claude 3 Sonnet	Accuracy	79.4	# 12	Compare
Multi-task Language Understanding	MMLU	Claude 3 Haiku (5-shot)	Average (%)	75.2	# 18	Compare
Multi-task Language Understanding	MMLU	Claude 3 Haiku (5-shot, CoT)	Average (%)	76.7	# 15	Compare
Multi-task Language Understanding	MMLU	Claude 3 Sonnet (5-shot)	Average (%)	79	# 10	Compare
Multi-task Language Understanding	MMLU	Claude 3 Sonnet (5-shot, CoT)	Average (%)	81.5	# 7	Compare
Multi-task Language Understanding	MMLU	Claude 3 Opus (5-shot)	Average (%)	86.8	# 3	Compare
Multi-task Language Understanding	MMLU	Claude 3 Opus (5-shot, CoT)	Average (%)	88.2	# 2	Compare
Common Sense Reasoning	WinoGrande	Claude 3 Opus (5-shot)	Accuracy	88.5	# 6	Compare
Common Sense Reasoning	WinoGrande	Claude 3 Sonnet (5-shot)	Accuracy	75.1	# 23	Compare
Common Sense Reasoning	WinoGrande	Claude 3 Haiku (5-shot)	Accuracy	74.2	# 25	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

The Claude 3 Model Family: Opus, Sonnet, Haiku

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove