TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	CommonsenseQA	STaR (on GPT-J)	Accuracy	72.3	# 17
Common Sense Reasoning	CommonsenseQA	STaR without Rationalization (on GPT-J)	Accuracy	68.8	# 19
Common Sense Reasoning	CommonsenseQA	GPT-J Direct Finetuned	Accuracy	60.0	# 28
Common Sense Reasoning	CommonsenseQA	Few-shot CoT LaMDA 137B	Accuracy	55.6	# 32
Common Sense Reasoning	CommonsenseQA	Few-shot CoT GPT-J	Accuracy	36.6	# 34
Common Sense Reasoning	CommonsenseQA	Few-shot Direct GPT-J	Accuracy	20.9	# 37

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/star-bootstrapping-reasoning-with-reasoning/common-sense-reasoning-on-commonsenseqa)](https://paperswithcode.com/sota/common-sense-reasoning-on-commonsenseqa?p=star-bootstrapping-reasoning-with-reasoning)`

STaR: Bootstrapping Reasoning With Reasoning

28 Mar 2022 · Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman ·

Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.

PDF Abstract

Code

Add Remove Mark official

ezelikman/STaR official

↳ Quickstart in

Colab

Tasks

Add Remove

Common Sense Reasoning

Language Modelling

Question Answering

Datasets

GSM8K

CommonsenseQA

Results from the Paper

Edit

Ranked #17 on Common Sense Reasoning on CommonsenseQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	CommonsenseQA	STaR (on GPT-J)	Accuracy	72.3	# 17	Compare
Common Sense Reasoning	CommonsenseQA	STaR without Rationalization (on GPT-J)	Accuracy	68.8	# 19	Compare
Common Sense Reasoning	CommonsenseQA	GPT-J Direct Finetuned	Accuracy	60.0	# 28	Compare
Common Sense Reasoning	CommonsenseQA	Few-shot CoT LaMDA 137B	Accuracy	55.6	# 32	Compare
Common Sense Reasoning	CommonsenseQA	Few-shot CoT GPT-J	Accuracy	36.6	# 34	Compare
Common Sense Reasoning	CommonsenseQA	Few-shot Direct GPT-J	Accuracy	20.9	# 37	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

STaR: Bootstrapping Reasoning With Reasoning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove