TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Offline RL	D4RL	Reformer	Average Reward	64.4	# 6
D4RL	D4RL	Reformer	Average Reward	63.9	# 8
Image Generation	ImageNet 64x64	Reformer (12 layers)	Bits per dim	3.710	# 19
Image Generation	ImageNet 64x64	Reformer (6 layers)	Bits per dim	3.740	# 22
Question Answering	Natural Questions (long)	Locality-Sensitive Hashing	F1	75.5	# 3
Question Answering	Quasart-T	Locality-Sensitive Hashing	EM	53.2	# 2
Open-Domain Question Answering	SearchQA	Locality-Sensitive Hashing	EM	66.0	# 2
Language Modelling	WikiText-103	Reformer 125M	Test perplexity	26.0	# 62

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/question-answering-on-quasart-t)](https://paperswithcode.com/sota/question-answering-on-quasart-t?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/open-domain-question-answering-on-searchqa)](https://paperswithcode.com/sota/open-domain-question-answering-on-searchqa?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/question-answering-on-natural-questions-long)](https://paperswithcode.com/sota/question-answering-on-natural-questions-long?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/offline-rl-on-d4rl)](https://paperswithcode.com/sota/offline-rl-on-d4rl?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/d4rl-on-d4rl)](https://paperswithcode.com/sota/d4rl-on-d4rl?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/image-generation-on-imagenet-64x64)](https://paperswithcode.com/sota/image-generation-on-imagenet-64x64?p=reformer-the-efficient-transformer-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/reformer-the-efficient-transformer-1/language-modelling-on-wikitext-103)](https://paperswithcode.com/sota/language-modelling-on-wikitext-103?p=reformer-the-efficient-transformer-1)`

Reformer: The Efficient Transformer

ICLR 2020 · Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya ·

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Code

Add Remove Mark official

google/trax official

↳ Quickstart in

Colab

7,956

huggingface/transformers

125,059

lucidrains/DALLE-pytorch

↳ Quickstart in

Colab

5,492

lucidrains/reformer-pytorch

↳ Quickstart in

Colab

2,053

lucidrains/bidirectional-cross-atte…

119

See all 15 implementations

Tasks

Add Remove

D4RL

Image Generation

Language Modelling

Offline RL

Open-Domain Question Answering

Question Answering

Datasets

Natural Questions

WikiText-2

WikiText-103

D4RL

SearchQA

QUASAR-T

Results from the Paper

Edit

Ranked #2 on Question Answering on Quasart-T

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Offline RL	D4RL	Reformer	Average Reward	64.4	# 6	Compare
D4RL	D4RL	Reformer	Average Reward	63.9	# 8	Compare
Question Answering	Natural Questions (long)	Locality-Sensitive Hashing	F1	75.5	# 3	Compare

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Image Generation	ImageNet 64x64	Reformer (12 layers)	Bits per dim	3.710	# 19	See all
Image Generation	ImageNet 64x64	Reformer (6 layers)	Bits per dim	3.740	# 22	See all
Question Answering	Quasart-T	Locality-Sensitive Hashing	EM	53.2	# 2	See all
Open-Domain Question Answering	SearchQA	Locality-Sensitive Hashing	EM	66.0	# 2	See all
Language Modelling	WikiText-103	Reformer 125M	Test perplexity	26.0	# 62	See all

Methods

Add Remove

Absolute Position Encodings • Adafactor • Adam • BPE • Dense Connections • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • LSH Attention • Multi-Head Attention • Position-Wise Feed-Forward Layer • Reformer • ReLU • Residual Connection • Reversible Residual Block • Scaled Dot-Product Attention • SentencePiece • Softmax • Transformer

Edit Social Preview

Reformer: The Efficient Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Results from Other Papers

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove