Transformers

Routing Transformer

Introduced by Roy et al. in Efficient Content-Based Sparse Attention with Routing Transformers

The Routing Transformer is a Transformer that endows self-attention with a sparse routing module based on online k-means. Each attention module considers a clustering of the space: the current timestep only attends to context belonging to the same cluster. In other word, the current time-step query is routed to a limited number of context through its cluster assignment.

Source: Efficient Content-Based Sparse Attention with Routing Transformers

Papers


Paper Code Results Date Stars

Categories