The Evolved Transformer

30 Jan 2019 David R. So Chen Liang Quoc V. Le

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling One Billion Word Evolved Transformer Big PPL 28.6 # 8
Machine Translation WMT2014 English-Czech Evolved Transformer Big BLEU score 28.2 # 1
Machine Translation WMT2014 English-Czech Evolved Transformer Base BLEU score 27.6 # 2
Machine Translation WMT2014 English-French Evolved Transformer Big BLEU score 41.3 # 14
Machine Translation WMT2014 English-French Evolved Transformer Base BLEU score 40.6 # 16
Machine Translation WMT2014 English-German Evolved Transformer Big BLEU score 29.3 # 12
Machine Translation WMT2014 English-German Evolved Transformer Base BLEU score 28.4 # 21
Machine Translation WMT2014 English-German Evolved Transformer BLEU score 29.8 # 8
SacreBLEU 29.2 # 4

Methods used in the Paper