Mixed Attention Block

Introduced by Jiang et al. in ConvBERT: Improving BERT with Span-based Dynamic Convolution

Mixed Attention Block is an attention module used in the ConvBERT architecture. It is a mixture of self-attention and span-based dynamic convolution (highlighted in pink). They share the same Query but use different Key to generate the attention map and convolution kernel respectively. The number of attention heads is reducing by directly projecting the input to a smaller embedding space to form a bottleneck structure for self-attention and span-based dynamic convolution. Dimensions of the input and output of some blocks are labeled on the left top corner to illustrate the overall framework, where $d$ is the embedding size of the input and $\gamma$ is the reduction ratio.

Source: ConvBERT: Improving BERT with Span-based Dynamic Convolution

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Bias Detection	1	12.50%
Object Detection	1	12.50%
RGB Salient Object Detection	1	12.50%
Salient Object Detection	1	12.50%
Automatic Speech Recognition (ASR)	1	12.50%
Punctuation Restoration	1	12.50%
Speech Recognition	1	12.50%
Natural Language Understanding	1	12.50%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Scaled Dot-Product Attention	Attention Mechanisms
Span-Based Dynamic Convolution	Convolutions

Categories

Add Remove

Attention Modules