MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning

The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.

PDF Abstract

Results from the Paper


Ranked #4 on Math Word Problem Solving on SVAMP (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Arithmetic Reasoning GSM8K MathCoder-L-70B Accuracy 83.9 # 48
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K MathCoder-L-7B Accuracy 64.2 # 101
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K MathCoder-CL-7B Accuracy 67.8 # 97
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K MathCoder-L-13B Accuracy 72.6 # 88
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K MathCoder-CL-13B Accuracy 74.1 # 81
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K MathCoder-CL-34B Accuracy 81.7 # 56
Parameters (Billion) 34 # 72
Math Word Problem Solving MATH MathCoder-CL-34B Accuracy 45.2 # 39
Parameters (Billions) 34 # 26
Math Word Problem Solving MATH MathCoder-L-7B Accuracy 23.3 # 72
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH MathCoder-L-13B Accuracy 29.9 # 63
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH MathCoder-CL-7B Accuracy 30.2 # 62
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH MathCoder-CL-13B Accuracy 35.9 # 54
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH MathCoder-L-34B Accuracy 45.1 # 40
Parameters (Billions) 34 # 26
Math Word Problem Solving SVAMP MathCoder-L-70B Execution Accuracy 84.9 # 4

Methods