WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.

PDF Abstract

Datasets


Results from the Paper


Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Arithmetic Reasoning GSM8K WizardMath-7B-V1.1 Accuracy 83.2 # 49
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K WizardMath-7B-V1.0 Accuracy 54.9 # 116
Parameters (Billion) 7 # 10
Arithmetic Reasoning GSM8K WizardMath-13B-V1.0 Accuracy 63.9 # 102
Parameters (Billion) 13 # 53
Arithmetic Reasoning GSM8K WizardMath-70B-V1.0 Accuracy 81.6 # 57
Parameters (Billion) 70 # 86
Math Word Problem Solving MATH WizardMath-7B-V1.1 Accuracy 33.0 # 58
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH WizardMath-7B-V1.0 Accuracy 10.7 # 89
Parameters (Billions) 7 # 58
Math Word Problem Solving MATH WizardMath-13B-V1.0 Accuracy 14.0 # 84
Parameters (Billions) 13 # 38
Math Word Problem Solving MATH WizardMath-70B-V1.0 Accuracy 22.7 # 73
Parameters (Billions) 70 # 11

Methods