GMAT: Global Memory Augmentation for Transformers

5 Jun 2020Ankit GuptaJonathan Berant

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise dot-product}$ attention that has a large $\Omega(L^2)$ memory requirement for length $L$ sequences, limiting its ability to process long documents... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper