TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Unsupervised Machine Translation	WMT2014 English-French	BERT-fused NMT	BLEU	38.27	# 1
Machine Translation	WMT2014 English-French	BERT-fused NMT	BLEU score	43.78	# 7
Machine Translation	WMT2014 English-German	BERT-fused NMT	BLEU score	30.75	# 8
Unsupervised Machine Translation	WMT2016 English--Romanian	BERT-fused NMT	BLEU	36.02	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/incorporating-bert-into-neural-machine-1/unsupervised-machine-translation-on-wmt2014-2)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2014-2?p=incorporating-bert-into-neural-machine-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/incorporating-bert-into-neural-machine-1/unsupervised-machine-translation-on-wmt2016-5)](https://paperswithcode.com/sota/unsupervised-machine-translation-on-wmt2016-5?p=incorporating-bert-into-neural-machine-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/incorporating-bert-into-neural-machine-1/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=incorporating-bert-into-neural-machine-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/incorporating-bert-into-neural-machine-1/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=incorporating-bert-into-neural-machine-1)`

Incorporating BERT into Neural Machine Translation

ICLR 2020 · Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu ·

The recently proposed BERT has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks enough exploration. While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning. This motivates us to think how to better leverage BERT for NMT along this direction. We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets. Our code is available at \url{https://github.com/bert-nmt/bert-nmt}.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Code

Add Remove Mark official

bert-nmt/bert-nmt official

351

vivekgohel56/Neural-machine-transla…

StuartCHAN/KARL

Tasks

Add Remove

Machine Translation

Natural Language Understanding

NMT

Reading Comprehension

Sentence

Text Classification

Translation

Unsupervised Machine Translation

Datasets

WMT 2014

WMT 2016

WMT 2016 News

Results from the Paper

Edit

Ranked #1 on Unsupervised Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Unsupervised Machine Translation	WMT2014 English-French	BERT-fused NMT	BLEU	38.27	# 1	Compare
Machine Translation	WMT2014 English-French	BERT-fused NMT	BLEU score	43.78	# 7	Compare
Machine Translation	WMT2014 English-German	BERT-fused NMT	BLEU score	30.75	# 8	Compare
Unsupervised Machine Translation	WMT2016 English--Romanian	BERT-fused NMT	BLEU	36.02	# 1	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Incorporating BERT into Neural Machine Translation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove