TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	ToRA-Code 7B	Accuracy	72.6	# 88
Arithmetic Reasoning	GSM8K	ToRA-Code 7B	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	ToRA-Code 13B	Accuracy	75.8	# 76
Arithmetic Reasoning	GSM8K	ToRA-Code 13B	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	ToRA-Code 34B	Accuracy	80.7	# 60
Arithmetic Reasoning	GSM8K	ToRA-Code 34B	Parameters (Billion)	34	# 72
Arithmetic Reasoning	GSM8K	ToRA 70B	Accuracy	84.3	# 46
Arithmetic Reasoning	GSM8K	ToRA 70B	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	ToRA-Code-34B (SC, k=50)	Accuracy	85.1	# 40
Arithmetic Reasoning	GSM8K	ToRA-Code-34B (SC, k=50)	Parameters (Billion)	34	# 72
Arithmetic Reasoning	GSM8K	ToRA-70B (SC, k=50)	Accuracy	88.3	# 26
Arithmetic Reasoning	GSM8K	ToRA-70B (SC, k=50)	Parameters (Billion)	70	# 86
Math Word Problem Solving	MATH	ToRA-Code 34B model (w/ code, SC, k=50)	Accuracy	60.0	# 10
Math Word Problem Solving	MATH	ToRA-Code 34B model (w/ code, SC, k=50)	Parameters (Billions)	34	# 26
Math Word Problem Solving	MATH	ToRA 7B (w/ code)	Accuracy	40.1	# 53
Math Word Problem Solving	MATH	ToRA 7B (w/ code)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	ToRA 13B (w/ code)	Accuracy	43.0	# 48
Math Word Problem Solving	MATH	ToRA 13B (w/ code)	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	ToRA-Code 7B (w/ code)	Accuracy	44.6	# 42
Math Word Problem Solving	MATH	ToRA-Code 7B (w/ code)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	ToRA-Code 13B (w/ code)	Accuracy	48.1	# 33
Math Word Problem Solving	MATH	ToRA-Code 13B (w/ code)	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	ToRA 70B (w/ code)	Accuracy	49.7	# 27
Math Word Problem Solving	MATH	ToRA 70B (w/ code)	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	ToRA-Code 34B (w/ code)	Accuracy	50.8	# 24
Math Word Problem Solving	MATH	ToRA-Code 34B (w/ code)	Parameters (Billions)	34	# 26
Math Word Problem Solving	MATH	ToRA 70B (w/ code, SC, k=50)	Accuracy	56.9	# 16
Math Word Problem Solving	MATH	ToRA 70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tora-a-tool-integrated-reasoning-agent-for/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=tora-a-tool-integrated-reasoning-agent-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tora-a-tool-integrated-reasoning-agent-for/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=tora-a-tool-integrated-reasoning-agent-for)`

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

29 Sep 2023 · Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen ·

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

PDF Abstract

Code

Add Remove Mark official

microsoft/tora official

822

Tasks

Add Remove

Arithmetic Reasoning

Computational Efficiency

Imitation Learning

Math

Mathematical Reasoning

Math Word Problem Solving

Datasets

GSM8K

MATH

SVAMP ASDiv MAWPS

Results from the Paper

Edit

Ranked #10 on Math Word Problem Solving on MATH (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	ToRA-Code 7B	Accuracy	72.6	# 88	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code 7B	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code 13B	Accuracy	75.8	# 76	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code 13B	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code 34B	Accuracy	80.7	# 60	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code 34B	Parameters (Billion)	34	# 72	Compare
Arithmetic Reasoning	GSM8K	ToRA 70B	Accuracy	84.3	# 46	Compare
Arithmetic Reasoning	GSM8K	ToRA 70B	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code-34B (SC, k=50)	Accuracy	85.1	# 40	Compare
Arithmetic Reasoning	GSM8K	ToRA-Code-34B (SC, k=50)	Parameters (Billion)	34	# 72	Compare
Arithmetic Reasoning	GSM8K	ToRA-70B (SC, k=50)	Accuracy	88.3	# 26	Compare
Arithmetic Reasoning	GSM8K	ToRA-70B (SC, k=50)	Parameters (Billion)	70	# 86	Compare
Math Word Problem Solving	MATH	ToRA-Code 34B model (w/ code, SC, k=50)	Accuracy	60.0	# 10	Compare
Math Word Problem Solving	MATH	ToRA-Code 34B model (w/ code, SC, k=50)	Parameters (Billions)	34	# 26	Compare
Math Word Problem Solving	MATH	ToRA 7B (w/ code)	Accuracy	40.1	# 53	Compare
Math Word Problem Solving	MATH	ToRA 7B (w/ code)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	ToRA 13B (w/ code)	Accuracy	43.0	# 48	Compare
Math Word Problem Solving	MATH	ToRA 13B (w/ code)	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	ToRA-Code 7B (w/ code)	Accuracy	44.6	# 42	Compare
Math Word Problem Solving	MATH	ToRA-Code 7B (w/ code)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	ToRA-Code 13B (w/ code)	Accuracy	48.1	# 33	Compare
Math Word Problem Solving	MATH	ToRA-Code 13B (w/ code)	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	ToRA 70B (w/ code)	Accuracy	49.7	# 27	Compare
Math Word Problem Solving	MATH	ToRA 70B (w/ code)	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	ToRA-Code 34B (w/ code)	Accuracy	50.8	# 24	Compare
Math Word Problem Solving	MATH	ToRA-Code 34B (w/ code)	Parameters (Billions)	34	# 26	Compare
Math Word Problem Solving	MATH	ToRA 70B (w/ code, SC, k=50)	Accuracy	56.9	# 16	Compare
Math Word Problem Solving	MATH	ToRA 70B (w/ code, SC, k=50)	Parameters (Billions)	70	# 11	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove