TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	Vicuna (SYRELM)	Accuracy	35.2	# 133
Arithmetic Reasoning	GSM8K	Vicuna (SYRELM)	Parameters (Billion)	13	# 53
Math Word Problem Solving	SVAMP	SYRELM (Vicuna 13B)	Execution Accuracy	56.65	# 11
Math Word Problem Solving	SVAMP	SYRELM (GPT-J)	Execution Accuracy	40.1	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/frugal-lms-trained-to-invoke-symbolic-solvers/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=frugal-lms-trained-to-invoke-symbolic-solvers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/frugal-lms-trained-to-invoke-symbolic-solvers/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=frugal-lms-trained-to-invoke-symbolic-solvers)`

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

9 Dec 2023 · Subhabrata Dutta, Joykirat Singh, Ishan Pandey, Sunny Manchanda, Soumen Chakrabarti, Tanmoy Chakraborty ·

Large Language Models (LLM) exhibit zero-shot mathematical reasoning capacity as a behavior emergent with scale, commonly manifesting as chain-of-thoughts (CoT) reasoning. However, multiple empirical findings suggest that this prowess is exclusive to LLMs with exorbitant sizes (beyond 50 billion parameters). Meanwhile, educational neuroscientists suggest that symbolic algebraic manipulation be introduced around the same time as arithmetic word problems to modularize language-to-formulation, symbolic manipulation of the formulation, and endgame arithmetic. In this paper, we start with the hypothesis that much smaller LMs, which are weak at multi-step reasoning, can achieve reasonable arithmetic reasoning if arithmetic word problems are posed as a formalize-then-solve task. In our architecture, which we call SYRELM, the LM serves the role of a translator to map natural language arithmetic questions into a formal language (FL) description. A symbolic solver then evaluates the FL expression to obtain the answer. A small frozen LM, equipped with an efficient low-rank adapter, is capable of generating FL expressions that incorporate natural language descriptions of the arithmetic problem (e.g., variable names and their purposes, formal expressions combining variables, etc.). We adopt policy-gradient reinforcement learning to train the adapted LM, informed by the non-differentiable symbolic solver. This marks a sharp departure from the recent development in tool-augmented LLMs, in which the external tools (e.g., calculator, Web search, etc.) are essentially detached from the learning phase of the LM. SYRELM shows massive improvements (e.g., +30.65 absolute point improvement in accuracy on the SVAMP dataset using GPT-J 6B model) over base LMs, while keeping our testbed easy to diagnose, interpret and within reach of most researchers.

PDF Abstract

Code

Add Remove Mark official

joykirat18/syrelm official

Tasks

Add Remove

Arithmetic Reasoning

Mathematical Reasoning

Math Word Problem Solving

Datasets

GSM8K

SVAMP ASDiv

Results from the Paper

Add Remove

Ranked #11 on Math Word Problem Solving on SVAMP (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	Vicuna (SYRELM)	Accuracy	35.2	# 133	Compare
Arithmetic Reasoning	GSM8K	Vicuna (SYRELM)	Parameters (Billion)	13	# 53	Compare
Math Word Problem Solving	SVAMP	SYRELM (Vicuna 13B)	Execution Accuracy	56.65	# 11	Compare
Math Word Problem Solving	SVAMP	SYRELM (GPT-J)	Execution Accuracy	40.1	# 19	Compare

Methods

Add Remove

BASE

Edit Social Preview

Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove