# Fast Transformers with Clustered Attention

9 Jul 2020Apoorv VyasAngelos KatharopoulosFrançois Fleuret

Transformers have been proven a successful model for a variety of tasks in sequence modeling. However, computing the attention matrix, which is their key component, has quadratic complexity with respect to the sequence length, thus making them prohibitively expensive for large sequences... (read more)

