Compressive Transformer

Introduced by Rae et al. in Compressive Transformers for Long-Range Sequence Modelling

The Compressive Transformer is an extension to the Transformer which maps past hidden activations (memories) to a smaller set of compressed representations (compressed memories). The Compressive Transformer uses the same attention mechanism over its set of memories and compressed memories, learning to query both its short-term granular memory and longer-term coarse memory. It builds on the ideas of Transformer-XL which maintains a memory of past activations at each layer to preserve a longer history of context. The Transformer-XL discards past activations when they become sufficiently old (controlled by the size of the memory). The key principle of the Compressive Transformer is to compress these old memories, instead of discarding them, and store them in an additional compressed memory.

At each time step $t$, we discard the oldest compressed memories (FIFO) and then the oldest $n$ states from ordinary memory are compressed and shifted to the new slot in compressed memory. During training, the compressive memory component is optimized separately from the main language model (separate training loop).

Source: Compressive Transformers for Long-Range Sequence Modelling

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Data Visualization	1	20.00%
Dimensionality Reduction	1	20.00%
Embeddings Evaluation	1	20.00%
Sentence	1	20.00%
Language Modelling	1	20.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Adam	Stochastic Optimization
Adaptive Input Representations	Input Embedding Factorization
Compressed Memory	Miscellaneous Components
Dense Connections	Feedforward Networks
Dropout	Regularization
Layer Normalization	Normalization
Linear Warmup With Cosine Annealing	Learning Rate Schedules
Multi-Head Attention	Attention Modules
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Transformers