OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

15 Feb 2024  ·  Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman ·

Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.

PDF Abstract

Results from the Paper


 Ranked #1 on Math Word Problem Solving on MAWPS (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Math Word Problem Solving ASDiv-A OpenMath-CodeLlama-70B (w/ code) Execution Accuracy 84.7 # 5
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-70B (w/ code) Accuracy 84.6 # 44
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-7B (w/ code, SC, k=50) Accuracy 84.8 # 41
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-7B (w/ code) Accuracy 75.9 # 75
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K OpenMath-Mistral-7B (w/ code, SC, k=50) Accuracy 86.9 # 34
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K OpenMath-Mistral-7B (w/ code) Accuracy 80.2 # 65
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-13B (w/ code, SC, k=50) Accuracy 86.8 # 35
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-13B (w/ code) Accuracy 78.8 # 67
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K OpenMath-Llama2-70B (w/ code, SC, k=50) Accuracy 90.1 # 21
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K OpenMath-Llama2-70B (w/ code) Accuracy 84.7 # 42
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-34B (w/ code, SC, k=50) Accuracy 88.0 # 28
Parameters (Billion) 34 # 72
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-70B (w/ code, SC, k=50) Accuracy 90.8 # 20
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K OpenMath-CodeLlama-34B (w/ code) Accuracy 80.7 # 60
Parameters (Billion) 34 # 72
Math Word Problem Solving MATH OpenMath-CodeLlama-70B (w/ code) Accuracy 50.7 # 25
Parameters (Billions) 70 # 11
Math Word Problem Solving MATH OpenMath-CodeLlama-34B (w/ code, SC, k=50) Accuracy 60.2 # 9
Parameters (Billions) 34 # 26
Math Word Problem Solving MATH OpenMath-CodeLlama-70B (w/ code, SC, k=50) Accuracy 60.4 # 8
Parameters (Billions) 70 # 11
Math Word Problem Solving MATH OpenMath-CodeLlama-7B (w/ code, SC, k=50) Accuracy 55.6 # 18
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH OpenMath-CodeLlama-7B (w/ code) Accuracy 43.6 # 45
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH OpenMath-Mistral-7B (w/ code, SC, k=50) Accuracy 57.2 # 15
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH OpenMath-Mistral-7B (w/ code) Accuracy 44.5 # 43
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH OpenMath-CodeLlama-13B (w/ code, SC, k=50) Accuracy 57.6 # 14
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH OpenMath-CodeLlama-13B (w/ code) Accuracy 45.5 # 38
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH OpenMath-Llama2-70B (w/ code, SC, k=50) Accuracy 58.3 # 12
Parameters (Billions) 70 # 11
Math Word Problem Solving MATH OpenMath-Llama2-70B (w/ code) Accuracy 46.3 # 37
Parameters (Billions) 70 # 11
Math Word Problem Solving MATH OpenMath-CodeLlama-34B (w/ code) Accuracy 48.3 # 32
Parameters (Billions) 34 # 26
Math Word Problem Solving MAWPS OpenMath-CodeLlama-70B (w/ code) Accuracy (%) 95.7 # 1
Math Word Problem Solving SVAMP OpenMath-CodeLlama-70B (w/ code) Execution Accuracy 87.8 # 3

Methods