OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Recent work has shown the immense potential of synthetically generated datasets for training large language models (LLMs), especially for acquiring targeted skills. Current large-scale math instruction tuning datasets such as MetaMathQA (Yu et al., 2024) and MAmmoTH (Yue et al., 2024) are constructed using outputs from closed-source LLMs with commercially restrictive licenses. A key reason limiting the use of open-source LLMs in these data generation pipelines has been the wide gap between the mathematical skills of the best closed-source LLMs, such as GPT-4, and the best open-source LLMs. Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1.8M problem-solution pairs. The dataset is constructed by synthesizing code-interpreter solutions for GSM8K and MATH, two popular math reasoning benchmarks, using the recently released and permissively licensed Mixtral model. Our best model, OpenMath-CodeLlama-70B, trained on a subset of OpenMathInstruct-1, achieves a score of 84.6% on GSM8K and 50.7% on MATH, which is competitive with the best gpt-distilled models. We release our code, models, and the OpenMathInstruct-1 dataset under a commercially permissive license.
PDF AbstractResults from the Paper
Ranked #1 on Math Word Problem Solving on MAWPS (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Math Word Problem Solving | ASDiv-A | OpenMath-CodeLlama-70B (w/ code) | Execution Accuracy | 84.7 | # 5 | ||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-70B (w/ code) | Accuracy | 84.6 | # 44 | ||
Parameters (Billion) | 70 | # 86 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-7B (w/ code, SC, k=50) | Accuracy | 84.8 | # 41 | ||
Parameters (Billion) | 7 | # 10 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-7B (w/ code) | Accuracy | 75.9 | # 75 | ||
Parameters (Billion) | 7 | # 10 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-Mistral-7B (w/ code, SC, k=50) | Accuracy | 86.9 | # 34 | ||
Parameters (Billion) | 7 | # 10 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-Mistral-7B (w/ code) | Accuracy | 80.2 | # 65 | ||
Parameters (Billion) | 7 | # 10 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-13B (w/ code, SC, k=50) | Accuracy | 86.8 | # 35 | ||
Parameters (Billion) | 13 | # 53 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-13B (w/ code) | Accuracy | 78.8 | # 67 | ||
Parameters (Billion) | 13 | # 53 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-Llama2-70B (w/ code, SC, k=50) | Accuracy | 90.1 | # 21 | ||
Parameters (Billion) | 70 | # 86 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-Llama2-70B (w/ code) | Accuracy | 84.7 | # 42 | ||
Parameters (Billion) | 70 | # 86 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-34B (w/ code, SC, k=50) | Accuracy | 88.0 | # 28 | ||
Parameters (Billion) | 34 | # 72 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-70B (w/ code, SC, k=50) | Accuracy | 90.8 | # 20 | ||
Parameters (Billion) | 70 | # 86 | |||||
Arithmetic Reasoning | GSM8K | OpenMath-CodeLlama-34B (w/ code) | Accuracy | 80.7 | # 60 | ||
Parameters (Billion) | 34 | # 72 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-70B (w/ code) | Accuracy | 50.7 | # 25 | ||
Parameters (Billions) | 70 | # 11 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-34B (w/ code, SC, k=50) | Accuracy | 60.2 | # 9 | ||
Parameters (Billions) | 34 | # 26 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-70B (w/ code, SC, k=50) | Accuracy | 60.4 | # 8 | ||
Parameters (Billions) | 70 | # 11 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-7B (w/ code, SC, k=50) | Accuracy | 55.6 | # 18 | ||
Parameters (Billions) | 7 | # 58 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-7B (w/ code) | Accuracy | 43.6 | # 45 | ||
Parameters (Billions) | 7 | # 58 | |||||
Math Word Problem Solving | MATH | OpenMath-Mistral-7B (w/ code, SC, k=50) | Accuracy | 57.2 | # 15 | ||
Parameters (Billions) | 7 | # 58 | |||||
Math Word Problem Solving | MATH | OpenMath-Mistral-7B (w/ code) | Accuracy | 44.5 | # 43 | ||
Parameters (Billions) | 7 | # 58 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-13B (w/ code, SC, k=50) | Accuracy | 57.6 | # 14 | ||
Parameters (Billions) | 13 | # 38 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-13B (w/ code) | Accuracy | 45.5 | # 38 | ||
Parameters (Billions) | 13 | # 38 | |||||
Math Word Problem Solving | MATH | OpenMath-Llama2-70B (w/ code, SC, k=50) | Accuracy | 58.3 | # 12 | ||
Parameters (Billions) | 70 | # 11 | |||||
Math Word Problem Solving | MATH | OpenMath-Llama2-70B (w/ code) | Accuracy | 46.3 | # 37 | ||
Parameters (Billions) | 70 | # 11 | |||||
Math Word Problem Solving | MATH | OpenMath-CodeLlama-34B (w/ code) | Accuracy | 48.3 | # 32 | ||
Parameters (Billions) | 34 | # 26 | |||||
Math Word Problem Solving | MAWPS | OpenMath-CodeLlama-70B (w/ code) | Accuracy (%) | 95.7 | # 1 | ||
Math Word Problem Solving | SVAMP | OpenMath-CodeLlama-70B (w/ code) | Execution Accuracy | 87.8 | # 3 |