Random Synthesized Attention

Introduced by Tay et al. in Synthesizer: Rethinking Self-Attention in Transformer Models

Random Synthesized Attention is a form of synthesized attention where the attention weights are not conditioned on any input tokens. Instead, the attention weights are initialized to random values. It was introduced with the Synthesizer architecture. Random Synthesized Attention contrasts with Dense Synthesized Attention which conditions on each token independently, as opposed to pairwise token interactions in the vanilla Transformer model.

Let $R$ be a randomly initialized matrix. Random Synthesized Attention is defined as:

$$Y = \text{Softmax}\left(R\right)G\left(X\right) $$

where $R \in \mathbb{R}^{l \text{ x } l}$. Notably, each head adds 2 parameters to the overall network. The basic idea of the Random Synthesizer is to not rely on pairwise token interactions or any information from individual token but rather to learn a task-specific alignment that works well globally across many samples. This is a direct generalization of the recently proposed fixed self-attention patterns of Raganato et al (2020).

Source: Synthesizer: Rethinking Self-Attention in Transformer Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Abstractive Text Summarization	1	11.11%
Dialogue Generation	1	11.11%
Document Summarization	1	11.11%
Language Modelling	1	11.11%
Linguistic Acceptability	1	11.11%
Machine Translation	1	11.11%
Semantic Textual Similarity	1	11.11%
Text Generation	1	11.11%
Translation	1	11.11%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Attention Mechanisms

Synthesized Attention Mechanisms