TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	MetaMath 70B	Accuracy	82.3	# 53
Arithmetic Reasoning	GSM8K	MetaMath 70B	Parameters (Billion)	70	# 86
Arithmetic Reasoning	GSM8K	MetaMath 7B	Accuracy	66.4	# 99
Arithmetic Reasoning	GSM8K	MetaMath 7B	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	MetaMath 13B	Accuracy	71.0	# 92
Arithmetic Reasoning	GSM8K	MetaMath 13B	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	MetaMath-Mistral-7B	Accuracy	77.7	# 71
Arithmetic Reasoning	GSM8K	MetaMath-Mistral-7B	Parameters (Billion)	7	# 10
Math Word Problem Solving	MATH	MetaMath 7B	Accuracy	19.4	# 78
Math Word Problem Solving	MATH	MetaMath 7B	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	MetaMath 70B	Accuracy	26.0	# 69
Math Word Problem Solving	MATH	MetaMath 70B	Parameters (Billions)	70	# 11
Math Word Problem Solving	MATH	MetaMath 13B	Accuracy	22.5	# 75
Math Word Problem Solving	MATH	MetaMath 13B	Parameters (Billions)	13	# 38

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metamath-bootstrap-your-own-mathematical/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=metamath-bootstrap-your-own-mathematical)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/metamath-bootstrap-your-own-mathematical/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=metamath-bootstrap-your-own-mathematical)`

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

21 Sep 2023 · Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu ·

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.

PDF Abstract

Code

Add Remove Mark official

meta-math/MetaMath official

318

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Language Modelling

Math

Mathematical Reasoning

Math Word Problem Solving

Natural Language Understanding

Datasets

GSM8K

MATH

Results from the Paper

Add Remove

Ranked #53 on Arithmetic Reasoning on GSM8K (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	MetaMath 70B	Accuracy	82.3	# 53	Compare
Arithmetic Reasoning	GSM8K	MetaMath 70B	Parameters (Billion)	70	# 86	Compare
Arithmetic Reasoning	GSM8K	MetaMath 7B	Accuracy	66.4	# 99	Compare
Arithmetic Reasoning	GSM8K	MetaMath 7B	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	MetaMath 13B	Accuracy	71.0	# 92	Compare
Arithmetic Reasoning	GSM8K	MetaMath 13B	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	MetaMath-Mistral-7B	Accuracy	77.7	# 71	Compare
Arithmetic Reasoning	GSM8K	MetaMath-Mistral-7B	Parameters (Billion)	7	# 10	Compare
Math Word Problem Solving	MATH	MetaMath 7B	Accuracy	19.4	# 78	Compare
Math Word Problem Solving	MATH	MetaMath 7B	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	MetaMath 70B	Accuracy	26.0	# 69	Compare
Math Word Problem Solving	MATH	MetaMath 70B	Parameters (Billions)	70	# 11	Compare
Math Word Problem Solving	MATH	MetaMath 13B	Accuracy	22.5	# 75	Compare
Math Word Problem Solving	MATH	MetaMath 13B	Parameters (Billions)	13	# 38	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove