TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	IWSLT2014 German-English	Rfa-Gate-arccos	BLEU score	34.4	# 27
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Big)	Validation perplexity	22	# 23
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Big)	Test perplexity	23.5	# 52
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Small)	Validation perplexity	29.4	# 28
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Small)	Test perplexity	30.5	# 70
Machine Translation	WMT2014 English-French	Rfa-Gate-arccos	BLEU score	39.2	# 37
Machine Translation	WMT2014 English-French	Rfa-Gate-arccos	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	Rfa-Gate-arccos	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	Rfa-Gate-arccos	BLEU score	28.2	# 48
Machine Translation	WMT2014 English-German	Rfa-Gate-arccos	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	Rfa-Gate-arccos	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/random-feature-attention-1/machine-translation-on-iwslt2014-german)](https://paperswithcode.com/sota/machine-translation-on-iwslt2014-german?p=random-feature-attention-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/random-feature-attention-1/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=random-feature-attention-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/random-feature-attention-1/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=random-feature-attention-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/random-feature-attention-1/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=random-feature-attention-1)`

Random Feature Attention

ICLR 2021 · Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong ·

Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, and explore its application in transformers. RFA can be used as a drop-in replacement for conventional softmax attention and offers a straightforward way of learning with recency bias through an optional gating mechanism. Experiments on language modeling and machine translation demonstrate that RFA achieves similar or better performance compared to strong transformer baselines. In the machine translation experiment, RFA decodes twice as fast as a vanilla transformer. Compared to existing efficient transformer variants, RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets. Our analysis shows that RFA's efficiency gains are especially notable on long sequences, suggesting that RFA will be particularly useful in tasks that require working with large inputs, fast decoding speed, or low memory footprints.

PDF Abstract ICLR 2021 PDF ICLR 2021 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Language Modelling

Machine Translation

text-classification

Text Classification

Translation

Datasets

IMDb Movie Reviews

WikiText-2

WikiText-103

WMT 2014

Results from the Paper

Edit

Ranked #27 on Machine Translation on IWSLT2014 German-English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	IWSLT2014 German-English	Rfa-Gate-arccos	BLEU score	34.4	# 27	Compare
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Big)	Validation perplexity	22	# 23	Compare
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Big)	Test perplexity	23.5	# 52	Compare
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Small)	Validation perplexity	29.4	# 28	Compare
Language Modelling	WikiText-103	Rfa-Gate-Gaussian-Stateful (Small)	Test perplexity	30.5	# 70	Compare
Machine Translation	WMT2014 English-French	Rfa-Gate-arccos	BLEU score	39.2	# 37	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	Rfa-Gate-arccos	BLEU score	28.2	# 48	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

Softmax

Edit Social Preview

Random Feature Attention

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove