TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	CommonsenseQA	Unicorn 11B (fine-tuned)	Accuracy	79.3	# 5
Sentence Completion	HellaSwag	Unicorn 11B (fine-tuned)	Accuracy	93.9	# 6
Question Answering	PIQA	Unicorn 11B (fine-tuned)	Accuracy	90.1	# 1
Question Answering	SIQA	Unicorn 11B (fine-tuned)	Accuracy	83.2	# 1
Common Sense Reasoning	WinoGrande	Unicorn 11B (fine-tuned)	Accuracy	91.3	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unicorn-on-rainbow-a-universal-commonsense/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=unicorn-on-rainbow-a-universal-commonsense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unicorn-on-rainbow-a-universal-commonsense/question-answering-on-social-iqa)](https://paperswithcode.com/sota/question-answering-on-social-iqa?p=unicorn-on-rainbow-a-universal-commonsense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unicorn-on-rainbow-a-universal-commonsense/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=unicorn-on-rainbow-a-universal-commonsense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unicorn-on-rainbow-a-universal-commonsense/common-sense-reasoning-on-commonsenseqa)](https://paperswithcode.com/sota/common-sense-reasoning-on-commonsenseqa?p=unicorn-on-rainbow-a-universal-commonsense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/unicorn-on-rainbow-a-universal-commonsense/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=unicorn-on-rainbow-a-universal-commonsense)`

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

24 Mar 2021 · Nicholas Lourie, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi ·

Commonsense AI has long been seen as a near impossible goal -- until recently. Now, research interest has sharply increased with an influx of new benchmarks and models. We propose two new ways to evaluate commonsense models, emphasizing their generality on new tasks and building on diverse, recently introduced benchmarks. First, we propose a new multitask benchmark, RAINBOW, to promote research on commonsense models that generalize well over multiple tasks and datasets. Second, we propose a novel evaluation, the cost equivalent curve, that sheds new insight on how the choice of source datasets, pretrained language models, and transfer learning methods impacts performance and data efficiency. We perform extensive experiments -- over 200 experiments encompassing 4800 models -- and report multiple valuable and sometimes surprising findings, e.g., that transfer almost always leads to better or equivalent performance if following a particular recipe, that QA-based commonsense datasets transfer well with each other, while commonsense knowledge graphs do not, and that perhaps counter-intuitively, larger models benefit more from transfer than smaller ones. Last but not least, we introduce a new universal commonsense reasoning model, UNICORN, that establishes new state-of-the-art performance across 8 popular commonsense benchmarks, aNLI (87.3%), CosmosQA (91.8%), HellaSWAG (93.9%), PIQA (90.1%), SocialIQa (83.2%), WinoGrande (86.6%), CycIC (94.0%) and CommonsenseQA (79.3%).

PDF Abstract

Code

Add Remove Mark official

allenai/rainbow official

Tasks

Add Remove

Common Sense Reasoning

Knowledge Graphs

Question Answering

Sentence Completion

Transfer Learning

Datasets

Introduced in the Paper:

Rainbow

Used in the Paper:

GLUE

HellaSwag

PIQA

CommonsenseQA

WinoGrande

ATOMIC

SWAG

SIQA

Results from the Paper

Edit

Ranked #1 on Question Answering on SIQA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	CommonsenseQA	Unicorn 11B (fine-tuned)	Accuracy	79.3	# 5	Compare
Sentence Completion	HellaSwag	Unicorn 11B (fine-tuned)	Accuracy	93.9	# 6	Compare
Question Answering	PIQA	Unicorn 11B (fine-tuned)	Accuracy	90.1	# 1	Compare
Question Answering	SIQA	Unicorn 11B (fine-tuned)	Accuracy	83.2	# 1	Compare
Common Sense Reasoning	WinoGrande	Unicorn 11B (fine-tuned)	Accuracy	91.3	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove