Very Deep Transformers for Neural Machine Translation

18 Aug 2020 Xiaodong Liu Kevin Duh Liyuan Liu Jianfeng Gao

We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers... (read more)

PDF Abstract

Results from the Paper


 Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Machine Translation WMT2014 English-French Transformer (ADMIN init) BLEU score 43.8 # 4
SacreBLEU 41.8 # 3
Machine Translation WMT2014 English-French Transformer+BT (ADMIN init) BLEU score 46.4 # 1
SacreBLEU 44.4 # 1
Machine Translation WMT2014 English-German Transformer (ADMIN init) BLEU score 30.1 # 5
SacreBLEU 29.5 # 3

Methods used in the Paper