Synthesizer: Rethinking Self-Attention in Transformer Models

2 May 2020Yi TayDara BahriDonald MetzlerDa-Cheng JuanZhe ZhaoChe Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models. But is it really required?.. (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Document Summarization CNN / Daily Mail Synthesizer (R+V) ROUGE-1 38.57 # 15
ROUGE-2 16.24 # 13
ROUGE-L 35.95 # 15
Linguistic Acceptability CoLA Dev Synthesizer (R+V) Accuracy 53.3 # 2
Semantic Textual Similarity MRPC Dev Synthesizer (R+V) Accuracy 91.2 # 1
Dialogue Generation Persona-Chat Synthesizer (R+V) BLEU-1 14.7 # 1
ROUGE-L 14.79 # 1
METEOR 6.39 # 1
CIDr 19.09 # 1
Machine Translation WMT2014 English-French Synthesizer (Random + Vanilla) BLEU score 41.85 # 9
Machine Translation WMT2014 English-German Synthesizer (Random + Vanilla) BLEU score 28.47 # 20

Methods used in the Paper