We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code generation. Our model leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost. We also provide a model fine-tuned to follow instructions, Mistral 7B -- Instruct, that surpasses the Llama 2 13B -- Chat model both on human and automated benchmarks. Our models are released under the Apache 2.0 license.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Common Sense Reasoning ARC (Challenge) Mistral 7B (0-shot) Accuracy 55.5 # 24
Common Sense Reasoning ARC (Easy) Mistral 7B (0-shot) Accuracy 80.0 # 12
Arithmetic Reasoning GSM8K Mistral 7B (maj@8) Accuracy 52.2 # 119
Parameters (Billion) 7 # 10
Sentence Completion HellaSwag Mistral 7B (0-shot) Accuracy 81.3 # 37
Code Generation HumanEval Mistral 7B (0-shot) Pass@1 30.5 # 78
Zero-Shot Video Question Answer IntentQA Mistral (7B) Accuracy 50.4 # 6
Math Word Problem Solving MATH Mistral 7B (maj@4) Accuracy 13.1 # 85
Parameters (Billions) 7 # 58
Code Generation MBPP Mistral 7B (3-shot) Accuracy 47.5 # 55
Multi-task Language Understanding MMLU Mistral 7B (5-shot) Average (%) 60.1 # 50
Question Answering Natural Questions Mistral 7B (5-shot) EM 28.8 # 29
Zero-Shot Video Question Answer NExT-GQA Mistral (7B) Acc@GQA 9.2 # 4
Zero-Shot Video Question Answer NExT-QA Mistral (7B) Accuracy 51.1 # 13
Question Answering PIQA Mistral 7B (0-shot) Accuracy 83.0 # 11
Question Answering TriviaQA Mistral 7B (5-shot) EM 69.9 # 24
Common Sense Reasoning WinoGrande Mistral 7B (0-shot) Accuracy 75.3 # 22

Methods