Inference Extrapolation

Attention with Linear Biases

Introduced by Press et al. in Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

ALiBi, or Attention with Linear Biases, is a positioning method that allows Transformer language models to consume, at inference time, sequences which are longer than the ones they were trained on.

ALiBi does this without using actual position embeddings. Instead, computing the attention between a certain key and query, ALiBi penalizes the attention value that that query can assign to the key depending on how far away the key and query are. So when a key and query are close by, the penalty is very low, and when they are far away, the penalty is very high.

This method was motivated by the simple reasoning that words that are close-by matter much more than ones that are far away.

This method is as fast as the sinusoidal or absolute embedding methods (the fastest positioning methods there are). It outperforms those methods and Rotary embeddings when evaluating sequences that are longer than the ones the model was trained on (this is known as extrapolation).

Source: Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories