Transformers

Reformer

Introduced by Kitaev et al. in Reformer: The Efficient Transformer

Reformer is a Transformer based architecture that seeks to make efficiency improvements. Dot-product attention is replaced by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L\log L$), where $L$ is the length of the sequence. Furthermore, Reformers use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of $N$ times, where $N$ is the number of layers.

Source: Reformer: The Efficient Transformer

Papers


Paper Code Results Date Stars

Categories