The Claude 3 Model Family: Opus, Sonnet, Haiku

Preprint 2024  ·  Anthropic ·

We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Arithmetic Reasoning GSM8K Claude 3 Haiku (0-shot chain-of-thought) Accuracy 88.9 # 26
Arithmetic Reasoning GSM8K Claude 3 Opus (0-shot chain-of-thought) Accuracy 95 # 10
Arithmetic Reasoning GSM8K Claude 3 Sonnet (0-shot chain-of-thought) Accuracy 92.3 # 19
Code Generation HumanEval Claude 3 Sonnet (0-shot) Pass@1 73 # 19
Code Generation HumanEval Claude 3 Opus (0-shot) Pass@1 84.9 # 9
Code Generation HumanEval Claude 3 Haiku (0-shot) Pass@1 75.9 # 15
Code Generation MBPP Claude 3 Opus Accuracy 86.4 # 4
Code Generation MBPP Claude 3 Haiku Accuracy 80.4 # 9
Code Generation MBPP Claude 3 Sonnet Accuracy 79.4 # 12
Multi-task Language Understanding MMLU Claude 3 Haiku (5-shot) Average (%) 75.2 # 18
Multi-task Language Understanding MMLU Claude 3 Haiku (5-shot, CoT) Average (%) 76.7 # 15
Multi-task Language Understanding MMLU Claude 3 Sonnet (5-shot) Average (%) 79 # 10
Multi-task Language Understanding MMLU Claude 3 Sonnet (5-shot, CoT) Average (%) 81.5 # 7
Multi-task Language Understanding MMLU Claude 3 Opus (5-shot) Average (%) 86.8 # 3
Multi-task Language Understanding MMLU Claude 3 Opus (5-shot, CoT) Average (%) 88.2 # 2
Common Sense Reasoning WinoGrande Claude 3 Opus (5-shot) Accuracy 88.5 # 6
Common Sense Reasoning WinoGrande Claude 3 Sonnet (5-shot) Accuracy 75.1 # 23
Common Sense Reasoning WinoGrande Claude 3 Haiku (5-shot) Accuracy 74.2 # 25

Methods


No methods listed for this paper. Add relevant methods here