TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	FLOATER-large	Accuracy	69%	# 15
Semantic Textual Similarity	MRPC	FLOATER-large	Accuracy	91.4%	# 5
Sentiment Analysis	SST-2 Binary classification	FLOATER-large	Accuracy	96.7	# 11
Machine Translation	WMT2014 English-French	FLOATER-large	BLEU score	42.7	# 16
Machine Translation	WMT2014 English-French	FLOATER-large	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	FLOATER-large	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	FLOATER-large	BLEU score	29.2	# 29
Machine Translation	WMT2014 English-German	FLOATER-large	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	FLOATER-large	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-encode-position-for-transformer/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=learning-to-encode-position-for-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-encode-position-for-transformer/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=learning-to-encode-position-for-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-encode-position-for-transformer/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=learning-to-encode-position-for-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-encode-position-for-transformer/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=learning-to-encode-position-for-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-encode-position-for-transformer/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=learning-to-encode-position-for-transformer)`

Learning to Encode Position for Transformer with Continuous Dynamical Model

ICML 2020 · Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh ·

We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are less sensitive to position. The main reason is that position information among input units is not inherently encoded, i.e., the models are permutation equivalent; this problem justifies why all of the existing models are accompanied by a sinusoidal encoding/embedding layer at the input. However, this solution has clear limitations: the sinusoidal encoding is not flexible enough as it is manually designed and does not contain any learnable parameters, whereas the position embedding restricts the maximum length of input sequences. It is thus desirable to design a new position layer that contains learnable parameters to adjust to different datasets and different architectures. At the same time, we would also like the encodings to extrapolate in accordance with the variable length of inputs. In our proposed solution, we borrow from the recent Neural ODE approach, which may be viewed as a versatile continuous version of a ResNet. This model is capable of modeling many kinds of dynamical systems. We model the evolution of encoded results along position index by such a dynamical system, thereby overcoming the above limitations of existing methods. We evaluate our new position layers on a variety of neural machine translation and language understanding tasks, the experimental results show consistent improvements over the baselines.

PDF Abstract ICML 2020 PDF

Code

Add Remove Mark official

xuanqing94/FLOATER

Tasks

Add Remove

Inductive Bias

Linguistic Acceptability

Machine Translation

Position

Semantic Textual Similarity

Sentiment Analysis

Datasets

GLUE

SST

SQuAD SST-2

MRPC

CoLA

RACE

WMT 2014

Results from the Paper

Edit

Ranked #5 on Semantic Textual Similarity on MRPC

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	FLOATER-large	Accuracy	69%	# 15	Compare
Semantic Textual Similarity	MRPC	FLOATER-large	Accuracy	91.4%	# 5	Compare
Sentiment Analysis	SST-2 Binary classification	FLOATER-large	Accuracy	96.7	# 11	Compare
Machine Translation	WMT2014 English-French	FLOATER-large	BLEU score	42.7	# 16	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	FLOATER-large	BLEU score	29.2	# 29	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

1x1 Convolution • Absolute Position Encodings • Adam • BPE • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Learning to Encode Position for Transformer with Continuous Dynamical Model

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove