We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B, 13B and 70B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 67% and 65% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Code Generation HumanEval Code Llama – Instruct 7B (zero-shot) Pass@1 34.8 # 71
Code Generation HumanEval Code Llama 13B (zero-shot) Pass@1 36 # 69
Code Generation HumanEval Code Llama – Instruct 34B (zero-shot) Pass@1 41.5 # 59
Code Generation HumanEval Code Llama 34B (zero-shot) Pass@1 48.8 # 45
Code Generation HumanEval Code Llama – Python 34B (zero-shot) Pass@1 53.7 # 40
Code Generation HumanEval Code Llama – Instruct 13B (zero-shot) Pass@1 42.7 # 58
Code Generation HumanEval Code Llama – Python 13B (zero-shot) Pass@1 43.3 # 56
Code Generation HumanEval Code Llama 7B (zero-shot) Pass@1 33.5 # 74
Code Generation HumanEval Code Llama – Python 7B (zero-shot) Pass@1 38.4 # 64
Code Generation HumanEval Unnatural Code Llama 34B (zero-shot) Pass@1 62.2 # 31
Code Generation MBPP Code Llama - Instruct 7B (3-shot) Accuracy 44.4 # 64
Code Generation MBPP Code Llama 70B (3-shot) Accuracy 62.4 # 30
Code Generation MBPP Code Llama 34B (3-shot) Accuracy 55 # 41
Code Generation MBPP Code Llama 13B (3-shot) Accuracy 47 # 58
Code Generation MBPP Code Llama 7B (3-shot) Accuracy 41.4 # 68
Code Generation MBPP GPT-3.5 Turbo Accuracy 52.2 # 44
Code Generation MBPP Code Llama - Python 13B (3-shot) Accuracy 49 # 50
Code Generation MBPP Code Llama - Python 70B (3-shot) Accuracy 65.5 # 27
Code Generation MBPP Code Llama - Python 34B (3-shot) Accuracy 56.2 # 39
Code Generation MBPP Code Llama - Python 7B (3-shot) Accuracy 47.6 # 54
Code Generation MBPP Unnatural Code Llama 34B (3-shot) Accuracy 61.2 # 34
Code Generation MBPP Code Llama - Instruct 70B (3-shot) Accuracy 62.2 # 31
Code Generation MBPP Code Llama - Instruct 34B (3-shot) Accuracy 57 # 38
Code Generation MBPP Code Llama - Instruct 13B (3-shot) Accuracy 49.4 # 48

Methods