Attention Modules

All-Attention Layer

Introduced by Sukhbaatar et al. in Augmenting Self-attention with Persistent Memory

An All-Attention Layer is an attention module and layer for transformers that merges the self-attention and feedforward sublayers into a single unified attention layer. As opposed to the two-step mechanism of the Transformer layer, it directly builds its representation from the context and a persistent memory block without going through a feedforward transformation. The additional persistent memory block stores, in the form of key-value vectors, information that does not depend on the context. In terms of parameters, these persistent key-value vectors replace the feedforward sublayer.

Source: Augmenting Self-attention with Persistent Memory

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 50.00%
Translation 1 50.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories