Attention Patterns

Routing Attention

Introduced by Roy et al. in Efficient Content-Based Sparse Attention with Routing Transformers

Routed Attention is an attention pattern proposed as part of the Routing Transformer architecture. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment. This can be contrasted with strided attention patterns and those proposed with the Sparse Transformer.

In the image to the right, the rows represent the outputs while the columns represent the inputs. The different colors represent cluster memberships for the output token.

Source: Efficient Content-Based Sparse Attention with Routing Transformers

Papers


Paper Code Results Date Stars

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories