Multi-Head Attention: Collaborate Instead of Concatenate

29 Jun 2020Jean-Baptiste CordonnierAndreas LoukasMartin Jaggi

Attention layers are widely used in natural language processing (NLP) and are beginning to influence computer vision architectures. However, they suffer from over-parameterization... (read more)

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper