ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

29 Sep 2023  ·  Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen ·

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

PDF Abstract

Results from the Paper


Ranked #10 on Math Word Problem Solving on MATH (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Arithmetic Reasoning GSM8K ToRA-Code 7B Accuracy 72.6 # 88
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K ToRA-Code 13B Accuracy 75.8 # 76
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K ToRA-Code 34B Accuracy 80.7 # 60
Parameters (Billion) 34 # 72
Arithmetic Reasoning GSM8K ToRA 70B Accuracy 84.3 # 46
Parameters (Billion) 70 # 86
Arithmetic Reasoning GSM8K ToRA-Code-34B (SC, k=50) Accuracy 85.1 # 40
Parameters (Billion) 34 # 72
Arithmetic Reasoning GSM8K ToRA-70B (SC, k=50) Accuracy 88.3 # 26
Parameters (Billion) 70 # 86
Math Word Problem Solving MATH ToRA-Code 34B model (w/ code, SC, k=50) Accuracy 60.0 # 10
Parameters (Billions) 34 # 26
Math Word Problem Solving MATH ToRA 7B (w/ code) Accuracy 40.1 # 53
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH ToRA 13B (w/ code) Accuracy 43.0 # 48
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH ToRA-Code 7B (w/ code) Accuracy 44.6 # 42
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH ToRA-Code 13B (w/ code) Accuracy 48.1 # 33
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH ToRA 70B (w/ code) Accuracy 49.7 # 27
Parameters (Billions) 70 # 11
Math Word Problem Solving MATH ToRA-Code 34B (w/ code) Accuracy 50.8 # 24
Parameters (Billions) 34 # 26
Math Word Problem Solving MATH ToRA 70B (w/ code, SC, k=50) Accuracy 56.9 # 16
Parameters (Billions) 70 # 11

Methods