1 code implementation • 5 Feb 2023 • Amir Zandieh, Insu Han, Majid Daliri, Amin Karbasi
Dot-product attention mechanism plays a crucial role in modern deep architectures (e. g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering the training of long-sequence models.