TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Classification	GLUE MRPC	TRANS-BLSTM	Accuracy	90.45	# 1
Text Classification	GLUE RTE	TRANS-BLSTM	Accuracy	79.78	# 1
Text Classification	GLUE SST2	TRANS-BLSTM	Accuracy	94.38	# 1
Natural Language Inference	QNLI	TRANS-BLSTM	Accuracy	94.08%	# 18
Paraphrase Identification	Quora Question Pairs	TRANS-BLSTM	Accuracy	88.28	# 16

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/text-classification-on-glue-mrpc)](https://paperswithcode.com/sota/text-classification-on-glue-mrpc?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/text-classification-on-glue-rte)](https://paperswithcode.com/sota/text-classification-on-glue-rte?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/text-classification-on-glue-sst2)](https://paperswithcode.com/sota/text-classification-on-glue-sst2?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/paraphrase-identification-on-quora-question)](https://paperswithcode.com/sota/paraphrase-identification-on-quora-question?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/text-classification-on-glue-cola)](https://paperswithcode.com/sota/text-classification-on-glue-cola?p=trans-blstm-transformer-with-bidirectional)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trans-blstm-transformer-with-bidirectional/text-classification-on-glue-stsb)](https://paperswithcode.com/sota/text-classification-on-glue-stsb?p=trans-blstm-transformer-with-bidirectional)`

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

16 Mar 2020 · Zhiheng Huang, Peng Xu, Davis Liang, Ajay Mishra, Bing Xiang ·

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model architecture is derived primarily from the transformer. Prior to the transformer era, bidirectional Long Short-Term Memory (BLSTM) has been the dominant modeling architecture for neural machine translation and question answering. In this paper, we investigate how these two modeling techniques can be combined to create a more powerful model architecture. We propose a new architecture denoted as Transformer with BLSTM (TRANS-BLSTM) which has a BLSTM layer integrated to each transformer block, leading to a joint modeling framework for transformer and BLSTM. We show that TRANS-BLSTM models consistently lead to improvements in accuracy compared to BERT baselines in GLUE and SQuAD 1.1 experiments. Our TRANS-BLSTM model obtains an F1 score of 94.01% on the SQuAD 1.1 development dataset, which is comparable to the state-of-the-art result.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Machine Translation

Natural Language Inference

Paraphrase Identification

Question Answering

Sentence

Sentence Classification

Text Classification

Translation

Datasets

GLUE

SQuAD

QNLI

Quora Question Pairs

Results from the Paper

Edit

Ranked #1 on Text Classification on GLUE MRPC

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Classification	GLUE MRPC	TRANS-BLSTM	Accuracy	90.45	# 1	Compare
Text Classification	GLUE RTE	TRANS-BLSTM	Accuracy	79.78	# 1	Compare
Text Classification	GLUE SST2	TRANS-BLSTM	Accuracy	94.38	# 1	Compare
Natural Language Inference	QNLI	TRANS-BLSTM	Accuracy	94.08%	# 18	Compare
Paraphrase Identification	Quora Question Pairs	TRANS-BLSTM	Accuracy	88.28	# 16	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BERT • BPE • Dense Connections • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Weight Decay • WordPiece

Edit Social Preview

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove