TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to the length of sequence, limiting its application potential for long sequence tasks... (read more)

Results in Papers With Code
(↓ scroll down to see all results)