TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Math Word Problem Solving	ASDiv-A	OpenMath-CodeLlama-70B (w/ code)	Execution Accuracy	84.7	# 5
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code)	Accuracy	84.6	# 44
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code)	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Accuracy	84.8	# 41
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code)	Accuracy	75.9	# 75
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code, SC, k=50)	Accuracy	86.9	# 34
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code, SC, k=50)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code)	Accuracy	80.2	# 65
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Accuracy	86.8	# 35
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code)	Accuracy	78.8	# 67
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code)	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code, SC, k=50)	Accuracy	90.1	# 21
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code, SC, k=50)	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code)	Accuracy	84.7	# 42
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code)	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Accuracy	88.0	# 28
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Parameters (Billion)	34	# 72
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Accuracy	90.8	# 20
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code)	Accuracy	80.7	# 60
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code)	Parameters (Billion)	34	# 72
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code)	Accuracy	50.7	# 25
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code)	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Accuracy	60.2	# 9
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Parameters (Billions)	34	# 26
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Accuracy	60.4	# 8
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Accuracy	55.6	# 18
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code)	Accuracy	43.6	# 45
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code, SC, k=50)	Accuracy	57.2	# 15
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code, SC, k=50)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code)	Accuracy	44.5	# 43
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Accuracy	57.6	# 14
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code)	Accuracy	45.5	# 38
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code)	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code, SC, k=50)	Accuracy	58.3	# 12
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code)	Accuracy	46.3	# 37
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code)	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code)	Accuracy	48.3	# 32
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code)	Parameters (Billions)	34	# 26
Math Word Problem Solving	MAWPS	OpenMath-CodeLlama-70B (w/ code)	Accuracy (%)	95.7	# 1
Math Word Problem Solving	SVAMP	OpenMath-CodeLlama-70B (w/ code)	Execution Accuracy	87.8	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openmathinstruct-1-a-1-8-million-math/math-word-problem-solving-on-mawps)](https://paperswithcode.com/sota/math-word-problem-solving-on-mawps?p=openmathinstruct-1-a-1-8-million-math)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openmathinstruct-1-a-1-8-million-math/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=openmathinstruct-1-a-1-8-million-math)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openmathinstruct-1-a-1-8-million-math/math-word-problem-solving-on-asdiv-a)](https://paperswithcode.com/sota/math-word-problem-solving-on-asdiv-a?p=openmathinstruct-1-a-1-8-million-math)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openmathinstruct-1-a-1-8-million-math/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=openmathinstruct-1-a-1-8-million-math)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openmathinstruct-1-a-1-8-million-math/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=openmathinstruct-1-a-1-8-million-math)`

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

15 Feb 2024 · Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman ·

Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.

PDF Abstract

Code

Add Remove Mark official

kipok/nemo-skills official

102

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Math

Math Word Problem Solving

Datasets

GSM8K

MATH

SVAMP ASDiv MAWPS MathInstruct

Results from the Paper

Add Remove

Ranked #1 on Math Word Problem Solving on MAWPS (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Math Word Problem Solving	ASDiv-A	OpenMath-CodeLlama-70B (w/ code)	Execution Accuracy	84.7	# 5	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code)	Accuracy	84.6	# 44	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code)	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Accuracy	84.8	# 41	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code)	Accuracy	75.9	# 75	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-7B (w/ code)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code, SC, k=50)	Accuracy	86.9	# 34	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code, SC, k=50)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code)	Accuracy	80.2	# 65	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Mistral-7B (w/ code)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Accuracy	86.8	# 35	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code)	Accuracy	78.8	# 67	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-13B (w/ code)	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code, SC, k=50)	Accuracy	90.1	# 21	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code, SC, k=50)	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code)	Accuracy	84.7	# 42	Compare
Arithmetic Reasoning	GSM8K	OpenMath-Llama2-70B (w/ code)	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Accuracy	88.0	# 28	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Parameters (Billion)	34	# 72	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Accuracy	90.8	# 20	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code)	Accuracy	80.7	# 60	Compare
Arithmetic Reasoning	GSM8K	OpenMath-CodeLlama-34B (w/ code)	Parameters (Billion)	34	# 72	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code)	Accuracy	50.7	# 25	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code)	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Accuracy	60.2	# 9	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code, SC, k=50)	Parameters (Billions)	34	# 26	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Accuracy	60.4	# 8	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Accuracy	55.6	# 18	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code, SC, k=50)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code)	Accuracy	43.6	# 45	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-7B (w/ code)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code, SC, k=50)	Accuracy	57.2	# 15	Compare
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code, SC, k=50)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code)	Accuracy	44.5	# 43	Compare
Math Word Problem Solving	MATH	OpenMath-Mistral-7B (w/ code)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Accuracy	57.6	# 14	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code, SC, k=50)	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code)	Accuracy	45.5	# 38	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-13B (w/ code)	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code, SC, k=50)	Accuracy	58.3	# 12	Compare
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code)	Accuracy	46.3	# 37	Compare
Math Word Problem Solving	MATH	OpenMath-Llama2-70B (w/ code)	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code)	Accuracy	48.3	# 32	Compare
Math Word Problem Solving	MATH	OpenMath-CodeLlama-34B (w/ code)	Parameters (Billions)	34	# 26	Compare
Math Word Problem Solving	MAWPS	OpenMath-CodeLlama-70B (w/ code)	Accuracy (%)	95.7	# 1	Compare
Math Word Problem Solving	SVAMP	OpenMath-CodeLlama-70B (w/ code)	Execution Accuracy	87.8	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove