TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	GPT-4 (Model Selection, SC K=5)	Accuracy	96.5	# 4
Arithmetic Reasoning	GSM8K	GPT-4 (Model Selection, SC K=15)	Accuracy	96.8	# 3
Math Word Problem Solving	SVAMP	GPT-4 (Model Selection)	Execution Accuracy	93.7	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/automatic-model-selection-with-large-language/math-word-problem-solving-on-svamp)](https://paperswithcode.com/sota/math-word-problem-solving-on-svamp?p=automatic-model-selection-with-large-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/automatic-model-selection-with-large-language/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=automatic-model-selection-with-large-language)`

Automatic Model Selection with Large Language Models for Reasoning

23 May 2023 · James Xu Zhao, Yuxi Xie, Kenji Kawaguchi, Junxian He, Michael Qizhe Xie ·

Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths. CoT employs natural language, offering flexibility and interpretability, while PAL utilizes programming language, yielding more structured and rigorous logic. We introduce a model selection method to combine the best of both worlds by employing a large language model (LLM) to dynamically select between them. Our theoretical analysis underscores the feasibility of this method, which is further corroborated by empirical results. Our proposed method demonstrates significant performance improvements across eight reasoning datasets with Codex, ChatGPT, and GPT-4. Additionally, our method is complementary to self-consistency; when integrated, it can further enhance performance while significantly reducing computation costs. Moreover, we achieve new state-of-the-art results on GSM8K and SVAMP, with respective accuracies of 96.8% and 93.7%. Our code, data and prompts are available at https://github.com/XuZhao0/Model-Selection-Reasoning

PDF Abstract

Code

Add Remove Mark official

xuzhao0/model-selection-reasoning official

Tasks

Add Remove

Arithmetic Reasoning

GSM8K

Language Modelling

Large Language Model

Math Word Problem Solving

Model Selection

Datasets

GSM8K

SVAMP ASDiv

Results from the Paper

Edit

Ranked #1 on Math Word Problem Solving on SVAMP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	GPT-4 (Model Selection, SC K=5)	Accuracy	96.5	# 4	Compare
Arithmetic Reasoning	GSM8K	GPT-4 (Model Selection, SC K=15)	Accuracy	96.8	# 3	Compare
Math Word Problem Solving	SVAMP	GPT-4 (Model Selection)	Execution Accuracy	93.7	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GPT-4 • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Automatic Model Selection with Large Language Models for Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove