The Claude 3 Model Family: Opus, Sonnet, Haiku
We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model. All new models have vision capabilities that enable them to process and analyze image data. The Claude 3 family demonstrates strong performance across benchmark evaluations and sets a new standard on measures of reasoning, math, and coding. Claude 3 Opus achieves state-of-the-art results on evaluations like GPQA [1], MMLU [2], MMMU [3] and many more. Claude 3 Haiku performs as well or better than Claude 2 [4] on most pure-text tasks, while Sonnet and Opus significantly outperform it. Additionally, these models exhibit improved fluency in non-English languages, making them more versatile for a global audience. In this report, we provide an in-depth analysis of our evaluations, focusing on core capabilities, safety, societal impacts, and the catastrophic risk assessments we committed to in our Responsible Scaling Policy.
PDF AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Arithmetic Reasoning | GSM8K | Claude 3 Haiku (0-shot chain-of-thought) | Accuracy | 88.9 | # 26 | |
Arithmetic Reasoning | GSM8K | Claude 3 Opus (0-shot chain-of-thought) | Accuracy | 95 | # 10 | |
Arithmetic Reasoning | GSM8K | Claude 3 Sonnet (0-shot chain-of-thought) | Accuracy | 92.3 | # 19 | |
Code Generation | HumanEval | Claude 3 Sonnet (0-shot) | Pass@1 | 73 | # 19 | |
Code Generation | HumanEval | Claude 3 Opus (0-shot) | Pass@1 | 84.9 | # 9 | |
Code Generation | HumanEval | Claude 3 Haiku (0-shot) | Pass@1 | 75.9 | # 15 | |
Code Generation | MBPP | Claude 3 Opus | Accuracy | 86.4 | # 4 | |
Code Generation | MBPP | Claude 3 Haiku | Accuracy | 80.4 | # 9 | |
Code Generation | MBPP | Claude 3 Sonnet | Accuracy | 79.4 | # 12 | |
Multi-task Language Understanding | MMLU | Claude 3 Haiku (5-shot) | Average (%) | 75.2 | # 18 | |
Multi-task Language Understanding | MMLU | Claude 3 Haiku (5-shot, CoT) | Average (%) | 76.7 | # 15 | |
Multi-task Language Understanding | MMLU | Claude 3 Sonnet (5-shot) | Average (%) | 79 | # 10 | |
Multi-task Language Understanding | MMLU | Claude 3 Sonnet (5-shot, CoT) | Average (%) | 81.5 | # 7 | |
Multi-task Language Understanding | MMLU | Claude 3 Opus (5-shot) | Average (%) | 86.8 | # 3 | |
Multi-task Language Understanding | MMLU | Claude 3 Opus (5-shot, CoT) | Average (%) | 88.2 | # 2 | |
Common Sense Reasoning | WinoGrande | Claude 3 Opus (5-shot) | Accuracy | 88.5 | # 6 | |
Common Sense Reasoning | WinoGrande | Claude 3 Sonnet (5-shot) | Accuracy | 75.1 | # 23 | |
Common Sense Reasoning | WinoGrande | Claude 3 Haiku (5-shot) | Accuracy | 74.2 | # 25 |