TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	MuggleMATH 70B	Accuracy	82.3	# 53
Arithmetic Reasoning	GSM8K	MuggleMATH 70B	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	MuggleMATH 7B	Accuracy	69.8	# 93
Arithmetic Reasoning	GSM8K	MuggleMATH 7B	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	MuggleMATH 13B	Accuracy	74	# 82
Arithmetic Reasoning	GSM8K	MuggleMATH 13B	Parameters (Billion)	13	# 53
Math Word Problem Solving	MATH	MuggleMATH-70B	Accuracy	42.1	# 50
Math Word Problem Solving	MATH	MuggleMATH-70B	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	MuggleMATH-13B	Accuracy	30.7	# 61
Math Word Problem Solving	MATH	MuggleMATH-13B	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	MuggleMATH 7B	Accuracy	25.8	# 70
Math Word Problem Solving	MATH	MuggleMATH 7B	Parameters (Billions)	7	# 58

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/query-and-response-augmentation-cannot-help/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=query-and-response-augmentation-cannot-help)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/query-and-response-augmentation-cannot-help/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=query-and-response-augmentation-cannot-help)`

Query and Response Augmentation Cannot Help Out-of-domain Math Reasoning Generalization

9 Oct 2023 · Chengpeng Li, Zheng Yuan, Hongyi Yuan, Guanting Dong, Keming Lu, Jiancan Wu, Chuanqi Tan, Xiang Wang, Chang Zhou ·

In math reasoning with large language models (LLMs), fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective, profoundly narrowing the gap between open-sourced LLMs and cutting-edge proprietary LLMs. In this paper, we conduct an investigation for such data augmentation in math reasoning and are intended to answer: (1) What strategies of data augmentation are more effective; (2) What is the scaling relationship between the amount of augmented data and model performance; and (3) Can data augmentation incentivize generalization to out-of-domain mathematical reasoning tasks? To this end, we create a new dataset, AugGSM8K, by complicating and diversifying the queries from GSM8K and sampling multiple reasoning paths. We obtained a series of LLMs called MuggleMath by fine-tuning on subsets of AugGSM8K. MuggleMath substantially achieves new state-of-the-art on GSM8K (from 54% to 68.4% at the scale of 7B, and from 63.9% to 74.0% at the scale of 13B). A log-linear relationship is presented between MuggleMath's performance and the amount of augmented data. We also find that MuggleMath is weak in out-of-domain math reasoning generalization to MATH. This is attributed to the differences in query distribution between AugGSM8K and MATH which suggest that augmentation on a single benchmark could not help with overall math reasoning performance. Codes and AugGSM8K will be uploaded to https://github.com/OFA-Sys/gsm8k-ScRel.

PDF Abstract

Code

Add Remove Mark official

ofa-sys/gsm8k-screl official

161

Tasks

Add Remove

Arithmetic Reasoning

Data Augmentation

GSM8K

Math

Mathematical Reasoning

Math Word Problem Solving

Datasets

GSM8K

MATH

Results from the Paper

Edit

Ranked #50 on Math Word Problem Solving on MATH (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	MuggleMATH 70B	Accuracy	82.3	# 53	Compare
Arithmetic Reasoning	GSM8K	MuggleMATH 70B	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	MuggleMATH 7B	Accuracy	69.8	# 93	Compare
Arithmetic Reasoning	GSM8K	MuggleMATH 7B	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	MuggleMATH 13B	Accuracy	74	# 82	Compare
Arithmetic Reasoning	GSM8K	MuggleMATH 13B	Parameters (Billion)	13	# 53	Compare
Math Word Problem Solving	MATH	MuggleMATH-70B	Accuracy	42.1	# 50	Compare
Math Word Problem Solving	MATH	MuggleMATH-70B	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	MuggleMATH-13B	Accuracy	30.7	# 61	Compare
Math Word Problem Solving	MATH	MuggleMATH-13B	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	MuggleMATH 7B	Accuracy	25.8	# 70	Compare
Math Word Problem Solving	MATH	MuggleMATH 7B	Parameters (Billions)	7	# 58	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Query and Response Augmentation Cannot Help Out-of-domain Math Reasoning Generalization

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove