Attention Modules

Feedback Memory

Introduced by Fan et al. in Addressing Some Limitations of Transformers with Feedback Memory

Feedback Memory is a type of attention module used in the Feedback Transformer architecture. It allows a transformer to to use the most abstract representations from the past directly as inputs for the current timestep. This means that the model does not form its representation in parallel, but sequentially token by token. More precisely, we replace the context inputs to attention modules with memory vectors that are computed over the past, i.e.:

$$ \mathbf{z}^{l}_{t} = \text{Attn}\left(\mathbf{x}^{l}_{t}, \left[\mathbf{m}_{t−\tau}, \dots, \mathbf{m}_{t−1}\right]\right) $$

where a memory vector $\mathbf{m}_{t}$ is computed by summing the representations of each layer at the $t$-th time step:

$$ \mathbf{m}_{t} = \sum^{L}_{l=0}\text{Softmax}\left(w^{l}\right)\mathbf{x}_{t}^{l} $$

where $w^{l}$ are learnable scalar parameters. Here $l = 0$ corresponds to token embeddings. The weighting of different layers by a softmax output gives the model more flexibility as it can average them or select one of them. This modification of the self-attention input adapts the computation of the Transformer from parallel to sequential, summarized in the Figure. Indeed, it gives the ability to formulate the representation $\mathbf{x}^{l}_{t+1}$ based on past representations from any layer $l'$, while in a standard Transformer this is only true for $l > l'$. This change can be viewed as exposing all previous computations to all future computations, providing better representations of the input. Such capacity would allow much shallower models to capture the same level of abstraction as a deeper architecture.

Source: Addressing Some Limitations of Transformers with Feedback Memory

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Point Cloud Registration 1 25.00%
Language Modelling 1 25.00%
Machine Translation 1 25.00%
Translation 1 25.00%

Components


Component Type
Softmax
Output Functions

Categories