Learning to Encode Position for Transformer with Continuous Dynamical Model

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are less sensitive to position... (read more)

PDF Abstract ICML 2020 PDF

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Linguistic Acceptability CoLA FLOATER-large Accuracy 69% # 3
Semantic Textual Similarity MRPC FLOATER-large Accuracy 91.4% # 3
Sentiment Analysis SST-2 Binary classification FLOATER-large Accuracy 96.7 # 5
Machine Translation WMT 2014 EN-DE FLOATER-large BLEU score 29.2 # 2
Machine Translation WMT 2014 EN-FR FLOATER-large BLEU score 42.7 # 1

Methods used in the Paper