MUSE: Multi-Scale Attention Model for Sequence to Sequence Learning

ICLR 2020 Anonymous

Transformers have achieved state-of-the-art results on a variety of natural language processing tasks. Despite good performance, Transformers are still weak in long sentence modeling where the global attention map is too dispersed to capture valuable information... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Machine Translation IWSLT2014 German-English MUSE(Parallel Multi-scale Attention) BLEU score 36.3 # 1
Machine Translation WMT2014 English-French MUSE(Paralllel Multi-scale Attention) BLEU score 43.5 # 2
Machine Translation WMT2014 English-German MUSE(Parallel Multi-scale Attention) BLEU score 29.9 # 7

Methods used in the Paper