# Linformer: Self-Attention with Linear Complexity

8 Jun 2020Sinong WangBelinda Z. LiMadian KhabsaHan FangHao Ma

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses $O(n^2)$ time and space with respect to sequence length... (read more)

