TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling

28 Jul 2020Shuai ZhangPeng ZhangXindian MaJunqiu WeiNingning WangQun Liu

Transformer has been widely-used in many Natural Language Processing (NLP) tasks and the scaled dot-product attention between tokens is a core module of Transformer. This attention is a token-wise design and its complexity is quadratic to the length of sequence, limiting its application potential for long sequence tasks... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper