TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	HumanEval	OctorCoder (GPT-4)	Pass@1	86.6	# 5
Code Generation	HumanEval	OctoGeeX	Pass@1	44.7	# 53
Code Generation	HumanEval	OctoCoder	Pass@1	46.2	# 49

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/octopack-instruction-tuning-code-large/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=octopack-instruction-tuning-code-large)`

OctoPack: Instruction Tuning Code Large Language Models

14 Aug 2023 · Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre ·

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

PDF Abstract

Code

Add Remove Mark official

bigcode-project/bigcode-evaluation-… official

644

bigcode-project/octopack official

↳ Quickstart in

Colab

Spaces

387

Tasks

Add Remove

Code Generation

Code Repair

Datasets

Introduced in the Paper:

HumanEvalPack CommitPack CommitPackFT

Used in the Paper:

HumanEval

xP3

Results from the Paper

Add Remove

Ranked #5 on Code Generation on HumanEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	HumanEval	OctorCoder (GPT-4)	Pass@1	86.6	# 5	Compare
Code Generation	HumanEval	OctoGeeX	Pass@1	44.7	# 53	Compare
Code Generation	HumanEval	OctoCoder	Pass@1	46.2	# 49	Compare

Methods

Add Remove

Repair

Edit Social Preview

OctoPack: Instruction Tuning Code Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove