TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Math Word Problem Solving	ASDiv-A	MMOS-DeepSeekMath-7B(0-shot)	Execution Accuracy	87.6	# 2
Math Word Problem Solving	ASDiv-A	MMOS-CODE-34B(0-shot)	Execution Accuracy	85.1	# 4
Math Word Problem Solving	ASDiv-A	MMOS-CODE-7B(0-shot)	Execution Accuracy	78.6	# 8
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot,k=50)	Accuracy	87.2	# 32
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot,k=50)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	MMOS-CODE-7B(0-shot)	Accuracy	73.9	# 84
Arithmetic Reasoning	GSM8K	MMOS-CODE-7B(0-shot)	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	MMOS-CODE-34B(0-shot)	Accuracy	80.4	# 65
Arithmetic Reasoning	GSM8K	MMOS-CODE-34B(0-shot)	Parameters (Billion)	34	# 72
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot)	Accuracy	80.5	# 64
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot)	Parameters (Billion)	7	# 10
Math Word Problem Solving	MATH	MMOS-CODE-34B(0-shot)	Accuracy	49.5	# 28
Math Word Problem Solving	MATH	MMOS-CODE-34B(0-shot)	Parameters (Billions)	34	# 26
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot,k=50)	Accuracy	63.7	# 6
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot,k=50)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot)	Accuracy	55.0	# 19
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot)	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	MMOS-CODE-7B(0-shot)	Accuracy	44.3	# 44
Math Word Problem Solving	MATH	MMOS-CODE-7B(0-shot)	Parameters (Billions)	7	# 58
Math Word Problem Solving	SVAMP	MMOS-DeepSeekMath-7B(0-shot)	Execution Accuracy	79.3	# 6
Math Word Problem Solving	SVAMP	MMOS-CODE-7B(0-shot)	Execution Accuracy	76.4	# 7
Math Word Problem Solving	SVAMP	MMOS-CODE-34B(0-shot)	Execution Accuracy	80.6	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-data-ability-boundary/math-word-problem-solving-on-asdiv-a)](https://paperswithcode.com/sota/math-word-problem-solving-on-asdiv-a?p=an-empirical-study-of-data-ability-boundary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-data-ability-boundary/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=an-empirical-study-of-data-ability-boundary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-data-ability-boundary/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=an-empirical-study-of-data-ability-boundary)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/an-empirical-study-of-data-ability-boundary/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=an-empirical-study-of-data-ability-boundary)`

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

23 Feb 2024 · Zui Chen, Yezeng Chen, Jiaqi Han, Zhijie Huang, Ji Qi, Yi Zhou ·

Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS.

PDF Abstract

Code

Add Remove Mark official

cyzhh/MMOS official

Tasks

Add Remove

Arithmetic Reasoning

Math Word Problem Solving

Datasets

Introduced in the Paper:

MMOS

Used in the Paper:

GSM8K

MATH

SVAMP ASDiv

Results from the Paper

Add Remove

Ranked #2 on Math Word Problem Solving on ASDiv-A (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Math Word Problem Solving	ASDiv-A	MMOS-DeepSeekMath-7B(0-shot)	Execution Accuracy	87.6	# 2	Compare
Math Word Problem Solving	ASDiv-A	MMOS-CODE-34B(0-shot)	Execution Accuracy	85.1	# 4	Compare
Math Word Problem Solving	ASDiv-A	MMOS-CODE-7B(0-shot)	Execution Accuracy	78.6	# 8	Compare
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot,k=50)	Accuracy	87.2	# 32	Compare
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot,k=50)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	MMOS-CODE-7B(0-shot)	Accuracy	73.9	# 84	Compare
Arithmetic Reasoning	GSM8K	MMOS-CODE-7B(0-shot)	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	MMOS-CODE-34B(0-shot)	Accuracy	80.4	# 65	Compare
Arithmetic Reasoning	GSM8K	MMOS-CODE-34B(0-shot)	Parameters (Billion)	34	# 72	Compare
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot)	Accuracy	80.5	# 64	Compare
Arithmetic Reasoning	GSM8K	MMOS-DeepSeekMath-7B(0-shot)	Parameters (Billion)	7	# 10	Compare
Math Word Problem Solving	MATH	MMOS-CODE-34B(0-shot)	Accuracy	49.5	# 28	Compare
Math Word Problem Solving	MATH	MMOS-CODE-34B(0-shot)	Parameters (Billions)	34	# 26	Compare
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot,k=50)	Accuracy	63.7	# 6	Compare
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot,k=50)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot)	Accuracy	55.0	# 19	Compare
Math Word Problem Solving	MATH	MMOS-DeepSeekMath-7B(0-shot)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	MMOS-CODE-7B(0-shot)	Accuracy	44.3	# 44	Compare
Math Word Problem Solving	MATH	MMOS-CODE-7B(0-shot)	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	SVAMP	MMOS-DeepSeekMath-7B(0-shot)	Execution Accuracy	79.3	# 6	Compare
Math Word Problem Solving	SVAMP	MMOS-CODE-7B(0-shot)	Execution Accuracy	76.4	# 7	Compare
Math Word Problem Solving	SVAMP	MMOS-CODE-34B(0-shot)	Execution Accuracy	80.6	# 5	Compare

Methods

Add Remove

BASE

Edit Social Preview

An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove