Universal Transformers

ICLR 2019 Mostafa DehghaniStephan GouwsOriol VinyalsJakob UszkoreitŁukasz Kaiser

Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Machine Translation WMT2014 English-German universal transformer base BLEU score 28.9 # 17

Methods used in the Paper