Attention

Multi-Query Attention

Introduced by Shazeer in Fast Transformer Decoding: One Write-Head is All You Need

Multi-head attention consists of multiple attention layers (heads) in parallel with different linear transformations on the queries, keys, values and outputs. Multi-query attention is identical except that the different heads share a single set of keys and values.

Source: Fast Transformer Decoding: One Write-Head is All You Need

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories