TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Natural Language Inference	SNLI	SemBERT	% Test Accuracy	91.9	# 6
Natural Language Inference	SNLI	SemBERT	% Train Accuracy	94.4	# 16
Natural Language Inference	SNLI	SemBERT	Parameters	339m	# 4
Question Answering	SQuAD2.0	SemBERT(ensemble)	EM	86.166	# 104
Question Answering	SQuAD2.0	SemBERT(ensemble)	F1	88.886	# 110
Question Answering	SQuAD2.0	SemBERT (single model)	EM	84.800	# 131
Question Answering	SQuAD2.0	SemBERT (single model)	F1	87.864	# 131
Question Answering	SQuAD2.0	SemBERT (ensemble)	EM	86.166	# 104
Question Answering	SQuAD2.0	SemBERT (ensemble)	F1	88.886	# 110
Question Answering	SQuAD2.0 dev	SemBERT large	F1	83.6	# 8
Question Answering	SQuAD2.0 dev	SemBERT large	EM	80.9	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/semantics-aware-bert-for-language/natural-language-inference-on-snli)](https://paperswithcode.com/sota/natural-language-inference-on-snli?p=semantics-aware-bert-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/semantics-aware-bert-for-language/question-answering-on-squad20-dev)](https://paperswithcode.com/sota/question-answering-on-squad20-dev?p=semantics-aware-bert-for-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/semantics-aware-bert-for-language/question-answering-on-squad20)](https://paperswithcode.com/sota/question-answering-on-squad20?p=semantics-aware-bert-for-language)`

Semantics-aware BERT for Language Understanding

5 Sep 2019 · Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li, Shuailiang Zhang, Xi Zhou, Xiang Zhou ·

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference tasks. However, the existing language representation models including ELMo, GPT and BERT only exploit plain context-sensitive features such as character or word embeddings. They rarely consider incorporating structured semantic information which can provide rich semantics for language representation. To promote natural language understanding, we propose to incorporate explicit contextual semantics from pre-trained semantic role labeling, and introduce an improved language representation model, Semantics-aware BERT (SemBERT), which is capable of explicitly absorbing contextual semantics over a BERT backbone. SemBERT keeps the convenient usability of its BERT precursor in a light fine-tuning way without substantial task-specific modifications. Compared with BERT, semantics-aware BERT is as simple in concept but more powerful. It obtains new state-of-the-art or substantially improves results on ten reading comprehension and language inference tasks.

PDF Abstract

Code

Add Remove Mark official

cooelf/SemBERT official

287

Tasks

Add Remove

Language Modelling

Machine Reading Comprehension

Natural Language Inference

Natural Language Understanding

Question Answering

Reading Comprehension

Semantic Role Labeling

Word Embeddings

Datasets

GLUE

SST

SQuAD SST-2

SNLI

QNLI

MRPC

CoLA

Results from the Paper

Edit

Ranked #6 on Natural Language Inference on SNLI

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Natural Language Inference	SNLI	SemBERT	% Test Accuracy	91.9	# 6	Compare
			% Train Accuracy	94.4	# 16	Compare
			Parameters	339m	# 4	Compare
Question Answering	SQuAD2.0	SemBERT(ensemble)	EM	86.166	# 104	Compare
Question Answering	SQuAD2.0	SemBERT(ensemble)	F1	88.886	# 110	Compare
Question Answering	SQuAD2.0	SemBERT (single model)	EM	84.800	# 131	Compare
Question Answering	SQuAD2.0	SemBERT (single model)	F1	87.864	# 131	Compare
Question Answering	SQuAD2.0	SemBERT (ensemble)	EM	86.166	# 104	Compare
Question Answering	SQuAD2.0	SemBERT (ensemble)	F1	88.886	# 110	Compare
Question Answering	SQuAD2.0 dev	SemBERT large	F1	83.6	# 8	Compare
Question Answering	SQuAD2.0 dev	SemBERT large	EM	80.9	# 7	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • BiLSTM • BPE • Cosine Annealing • Dense Connections • Discriminative Fine-Tuning • Dropout • ELMo • GELU • GPT • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Linear Warmup With Linear Decay • LSTM • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Sigmoid Activation • Softmax • Tanh Activation • Weight Decay • WordPiece

Edit Social Preview

Semantics-aware BERT for Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove