TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Question Answering	DaNetQA	Baseline TF-IDF1.1	Accuracy	0.621	# 16
Question Answering	DaNetQA	Human Benchmark	Accuracy	0.915	# 2
Natural Language Inference	LiDiRus	Baseline TF-IDF1.1	MCC	0.06	# 17
Natural Language Inference	LiDiRus	Human Benchmark	MCC	0.626	# 1
Reading Comprehension	MuSeRC	Baseline TF-IDF1.1	Average F1	0.587	# 20
Reading Comprehension	MuSeRC	Baseline TF-IDF1.1	EM	0.242	# 17
Reading Comprehension	MuSeRC	Human Benchmark	Average F1	0.806	# 5
Reading Comprehension	MuSeRC	Human Benchmark	EM	0.42	# 8
Common Sense Reasoning	PARus	Baseline TF-IDF1.1	Accuracy	0.486	# 19
Common Sense Reasoning	PARus	Human Benchmark	Accuracy	0.982	# 1
Natural Language Inference	RCB	Baseline TF-IDF1.1	Average F1	0.301	# 21
Natural Language Inference	RCB	Baseline TF-IDF1.1	Accuracy	0.441	# 19
Natural Language Inference	RCB	Human Benchmark	Average F1	0.68	# 1
Natural Language Inference	RCB	Human Benchmark	Accuracy	0.702	# 1
Common Sense Reasoning	RuCoS	Baseline TF-IDF1.1	Average F1	0.26	# 15
Common Sense Reasoning	RuCoS	Baseline TF-IDF1.1	EM	0.252	# 16
Common Sense Reasoning	RuCoS	Human Benchmark	Average F1	0.93	# 1
Common Sense Reasoning	RuCoS	Human Benchmark	EM	0.89	# 2
Word Sense Disambiguation	RUSSE	Baseline TF-IDF1.1	Accuracy	0.57	# 19
Word Sense Disambiguation	RUSSE	Human Benchmark	Accuracy	0.805	# 1
Common Sense Reasoning	RWSD	Human Benchmark	Accuracy	0.84	# 22
Common Sense Reasoning	RWSD	Baseline TF-IDF1.1	Accuracy	0.662	# 6
Natural Language Inference	TERRa	Human Benchmark	Accuracy	0.92	# 1
Natural Language Inference	TERRa	Baseline TF-IDF1.1	Accuracy	0.471	# 22

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/natural-language-inference-on-lidirus)](https://paperswithcode.com/sota/natural-language-inference-on-lidirus?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/common-sense-reasoning-on-parus)](https://paperswithcode.com/sota/common-sense-reasoning-on-parus?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/natural-language-inference-on-rcb)](https://paperswithcode.com/sota/natural-language-inference-on-rcb?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/common-sense-reasoning-on-rucos)](https://paperswithcode.com/sota/common-sense-reasoning-on-rucos?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/word-sense-disambiguation-on-russe)](https://paperswithcode.com/sota/word-sense-disambiguation-on-russe?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/natural-language-inference-on-terra)](https://paperswithcode.com/sota/natural-language-inference-on-terra?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/question-answering-on-danetqa)](https://paperswithcode.com/sota/question-answering-on-danetqa?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/reading-comprehension-on-muserc)](https://paperswithcode.com/sota/reading-comprehension-on-muserc?p=russiansuperglue-a-russian-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/russiansuperglue-a-russian-language/common-sense-reasoning-on-rwsd)](https://paperswithcode.com/sota/common-sense-reasoning-on-rwsd?p=russiansuperglue-a-russian-language)`

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

EMNLP 2020 · Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, Andrey Chertok, Andrey Evlampiev ·

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (https://github.com/RussianNLP/RussianSuperGLUE), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the adapted diagnostic test set and offer the first steps to further expanding or assessing state-of-the-art models independently of language.

PDF Abstract EMNLP 2020 PDF EMNLP 2020 Abstract

Code

Add Remove Mark official

RussianNLP/RussianSuperGLUE official

103

RussianNLP/MOROCCO

Tasks

Add Remove

Common Sense Reasoning

Lexical Entailment

Logical Reasoning Question Answering

Natural Language Inference

Natural Language Understanding

Question Answering

Reading Comprehension

Word Sense Disambiguation

Datasets

Introduced in the Paper:

TERRa RWSD DaNetQA PARus MuSeRC RCB LiDiRus RuCoS

Used in the Paper:

GLUE

BoolQ

SuperGLUE

WSC

decaNLP RUSSE

Results from the Paper

Edit

Ranked #1 on Word Sense Disambiguation on RUSSE

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Question Answering	DaNetQA	Baseline TF-IDF1.1	Accuracy	0.621	# 16	Compare
Question Answering	DaNetQA	Human Benchmark	Accuracy	0.915	# 2	Compare
Natural Language Inference	LiDiRus	Baseline TF-IDF1.1	MCC	0.06	# 17	Compare
Natural Language Inference	LiDiRus	Human Benchmark	MCC	0.626	# 1	Compare
Reading Comprehension	MuSeRC	Baseline TF-IDF1.1	Average F1	0.587	# 20	Compare
Reading Comprehension	MuSeRC	Baseline TF-IDF1.1	EM	0.242	# 17	Compare
Reading Comprehension	MuSeRC	Human Benchmark	Average F1	0.806	# 5	Compare
Reading Comprehension	MuSeRC	Human Benchmark	EM	0.42	# 8	Compare
Common Sense Reasoning	PARus	Baseline TF-IDF1.1	Accuracy	0.486	# 19	Compare
Common Sense Reasoning	PARus	Human Benchmark	Accuracy	0.982	# 1	Compare
Natural Language Inference	RCB	Baseline TF-IDF1.1	Average F1	0.301	# 21	Compare
Natural Language Inference	RCB	Baseline TF-IDF1.1	Accuracy	0.441	# 19	Compare
Natural Language Inference	RCB	Human Benchmark	Average F1	0.68	# 1	Compare
Natural Language Inference	RCB	Human Benchmark	Accuracy	0.702	# 1	Compare
Common Sense Reasoning	RuCoS	Baseline TF-IDF1.1	Average F1	0.26	# 15	Compare
Common Sense Reasoning	RuCoS	Baseline TF-IDF1.1	EM	0.252	# 16	Compare
Common Sense Reasoning	RuCoS	Human Benchmark	Average F1	0.93	# 1	Compare
Common Sense Reasoning	RuCoS	Human Benchmark	EM	0.89	# 2	Compare
Word Sense Disambiguation	RUSSE	Baseline TF-IDF1.1	Accuracy	0.57	# 19	Compare
Word Sense Disambiguation	RUSSE	Human Benchmark	Accuracy	0.805	# 1	Compare
Common Sense Reasoning	RWSD	Human Benchmark	Accuracy	0.84	# 22	Compare
Common Sense Reasoning	RWSD	Baseline TF-IDF1.1	Accuracy	0.662	# 6	Compare
Natural Language Inference	TERRa	Human Benchmark	Accuracy	0.92	# 1	Compare
Natural Language Inference	TERRa	Baseline TF-IDF1.1	Accuracy	0.471	# 22	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove