Attention Mechanisms

To aggregate global spatial information, an SE block applies global pooling to the feature map. However, it ignores pixel-wise spatial information, which is important in dense prediction tasks. Therefore, Roy et al. proposed spatial and channel SE blocks (scSE). Like BAM, spatial SE blocks are used, complementing SE blocks, to provide spatial attention weights to focus on important regions.

Given the input feature map $X$, two parallel modules, spatial SE and channel SE, are applied to feature maps to encode spatial and channel information respectively. The channel SE module is an ordinary SE block, while the spatial SE module adopts $1\times 1$ convolution for spatial squeezing. The outputs from the two modules are fused. The overall process can be written as \begin{align} s_c & = \sigma (W_{2} \delta (W_{1}\text{GAP}(X))) \end{align} \begin{align} X_\text{chn} & = s_c X \end{align} \begin{align} s_s &= \sigma(\text{Conv}^{1\times 1}(X)) \end{align} \begin{align} X_\text{spa} & = s_s X \end{align} \begin{align} Y &= f(X_\text{spa},X_\text{chn})
\end{align}

where $f$ denotes the fusion function, which can be maximum, addition, multiplication or concatenation.

The proposed scSE block combines channel and spatial attention to enhance features as well as capturing pixel-wise spatial information. Segmentation tasks are greatly benefited as a result. The integration of an scSE block in F-CNNs makes a consistent improvement in semantic segmentation at negligible extra cost.

Source: Recalibrating Fully Convolutional Networks with Spatial and Channel 'Squeeze & Excitation' Blocks

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Anatomy 1 25.00%
Skull Stripping 1 25.00%
Image Classification 1 25.00%
Semantic Segmentation 1 25.00%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories