Attention gate focuses on targeted regions while suppressing feature activations in irrelevant regions. Given the input feature map $X$ and the gating signal $G\in \mathbb{R}^{C'\times H\times W}$ which is collected at a coarse scale and contains contextual information, the attention gate uses additive attention to obtain the gating coefficient. Both the input $X$ and the gating signal are first linearly mapped to an $\mathbb{R}^{F\times H\times W}$ dimensional space, and then the output is squeezed in the channel domain to produce a spatial attention weight map $ S \in \mathbb{R}^{1\times H\times W}$. The overall process can be written as \begin{align} S &= \sigma(\varphi(\delta(\phi_x(X)+\phi_g(G)))) \end{align} \begin{align} Y &= S X \end{align} where $\varphi$, $\phi_x$ and $\phi_g$ are linear transformations implemented as $1\times 1$ convolutions.
The attention gate guides the model's attention to important regions while suppressing feature activation in unrelated areas. It substantially enhances the representational power of the model without a significant increase in computing cost or number of model parameters due to its lightweight design. It is general and modular, making it simple to use in various CNN models.
Source: Attention U-Net: Learning Where to Look for the PancreasPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Semantic Segmentation | 9 | 12.33% |
Image Segmentation | 7 | 9.59% |
Medical Image Segmentation | 6 | 8.22% |
Tumor Segmentation | 4 | 5.48% |
Brain Tumor Segmentation | 3 | 4.11% |
Question Answering | 2 | 2.74% |
Visual Question Answering | 2 | 2.74% |
Visual Question Answering (VQA) | 2 | 2.74% |
Video Understanding | 2 | 2.74% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |