TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Natural Language Inference	MultiNLI	Multi-task BiLSTM + Attn	Matched	72.2	# 46
Natural Language Inference	MultiNLI	Multi-task BiLSTM + Attn	Mismatched	72.1	# 36

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/glue-a-multi-task-benchmark-and-analysis/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=glue-a-multi-task-benchmark-and-analysis)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/glue-a-multi-task-benchmark-and-analysis/qqp-on-qqp)](https://paperswithcode.com/sota/qqp-on-qqp?p=glue-a-multi-task-benchmark-and-analysis)`

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

WS 2018 · Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman ·

For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems.

PDF Abstract WS 2018 PDF WS 2018 Abstract

Code

Add Remove Mark official

ofa-sys/ofa

2,323

alibaba/EasyNLP

1,946

jsalt18-sentence-repl/jiant

↳ Quickstart in

Colab

1,605

nyu-mll/GLUE-baselines

723

benzakenelad/BitFit

124

See all 11 implementations

Tasks

Add Remove

Natural Language Inference

Natural Language Understanding

QQP

Transfer Learning

Datasets

Introduced in the Paper:

GLUE

QNLI

Used in the Paper:

SST

MultiNLI SST-2

SNLI

MRPC

CoLA

WSC

SentEval

Quora Question Pairs

Results from the Paper

Edit

Ranked #46 on Natural Language Inference on MultiNLI

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Natural Language Inference	MultiNLI	Multi-task BiLSTM + Attn	Matched	72.2	# 46		Compare
Natural Language Inference	MultiNLI	Multi-task BiLSTM + Attn	Mismatched	72.1	# 36		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove