BP-Transformer

Introduced by Ye et al. in BP-Transformer: Modelling Long-Range Context via Binary Partitioning

The BP-Transformer (BPT) is a type of Transformer that is motivated by the need to find a better balance between capability and computational complexity for self-attention. The architecture partitions the input sequence into different multi-scale spans via binary partitioning (BP). It incorporates an inductive bias of attending the context information from fine-grain to coarse-grain as the relative distance increases. The farther the context information is, the coarser its representation is. BPT can be regard as graph neural network, whose nodes are the multi-scale spans. A token node can attend the smaller-scale span for the closer context and the larger-scale span for the longer distance context. The representations of nodes are updated with Graph Self-Attention.

Source: BP-Transformer: Modelling Long-Range Context via Binary Partitioning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	1	20.00%
Machine Translation	1	20.00%
Sentiment Analysis	1	20.00%
Text Classification	1	20.00%
Translation	1	20.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Graph Self-Attention	Attention Modules

Categories

Add Remove

Transformers