TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.1	Accuracy	83.2	# 49
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.1	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.0	Accuracy	54.9	# 116
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.0	Parameters (Billion)	7	# 10
Arithmetic Reasoning	GSM8K	WizardMath-13B-V1.0	Accuracy	63.9	# 102
Arithmetic Reasoning	GSM8K	WizardMath-13B-V1.0	Parameters (Billion)	13	# 53
Arithmetic Reasoning	GSM8K	WizardMath-70B-V1.0	Accuracy	81.6	# 57
Arithmetic Reasoning	GSM8K	WizardMath-70B-V1.0	Parameters (Billion)	70	# 86
Math Word Problem Solving	MATH	WizardMath-7B-V1.1	Accuracy	33.0	# 58
Math Word Problem Solving	MATH	WizardMath-7B-V1.1	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	WizardMath-7B-V1.0	Accuracy	10.7	# 89
Math Word Problem Solving	MATH	WizardMath-7B-V1.0	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	WizardMath-13B-V1.0	Accuracy	14.0	# 84
Math Word Problem Solving	MATH	WizardMath-13B-V1.0	Parameters (Billions)	13	# 38
Math Word Problem Solving	MATH	WizardMath-70B-V1.0	Accuracy	22.7	# 73
Math Word Problem Solving	MATH	WizardMath-70B-V1.0	Parameters (Billions)	70	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wizardmath-empowering-mathematical-reasoning/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=wizardmath-empowering-mathematical-reasoning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wizardmath-empowering-mathematical-reasoning/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=wizardmath-empowering-mathematical-reasoning)`

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

18 Aug 2023 · Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, JianGuang Lou, Chongyang Tao, Xiubo Geng, QIngwei Lin, Shifeng Chen, Dongmei Zhang ·

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.

PDF Abstract

Code

Add Remove Mark official

nlpxucan/wizardlm

8,907

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Math

Mathematical Reasoning

Math Word Problem Solving

Datasets

GSM8K

MATH

Results from the Paper

Edit

Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.1	Accuracy	83.2	# 49	Compare
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.1	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.0	Accuracy	54.9	# 116	Compare
Arithmetic Reasoning	GSM8K	WizardMath-7B-V1.0	Parameters (Billion)	7	# 10	Compare
Arithmetic Reasoning	GSM8K	WizardMath-13B-V1.0	Accuracy	63.9	# 102	Compare
Arithmetic Reasoning	GSM8K	WizardMath-13B-V1.0	Parameters (Billion)	13	# 53	Compare
Arithmetic Reasoning	GSM8K	WizardMath-70B-V1.0	Accuracy	81.6	# 57	Compare
Arithmetic Reasoning	GSM8K	WizardMath-70B-V1.0	Parameters (Billion)	70	# 86	Compare
Math Word Problem Solving	MATH	WizardMath-7B-V1.1	Accuracy	33.0	# 58	Compare
Math Word Problem Solving	MATH	WizardMath-7B-V1.1	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	WizardMath-7B-V1.0	Accuracy	10.7	# 89	Compare
Math Word Problem Solving	MATH	WizardMath-7B-V1.0	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	WizardMath-13B-V1.0	Accuracy	14.0	# 84	Compare
Math Word Problem Solving	MATH	WizardMath-13B-V1.0	Parameters (Billions)	13	# 38	Compare
Math Word Problem Solving	MATH	WizardMath-70B-V1.0	Accuracy	22.7	# 73	Compare
Math Word Problem Solving	MATH	WizardMath-70B-V1.0	Parameters (Billions)	70	# 11	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Transformer • Weight Decay

Edit Social Preview

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove