The Bottleneck Transformer (BoTNet) is an image classification model that incorporates self-attention for multiple computer vision tasks including image classification, object detection and instance segmentation. By just replacing the spatial convolutions with global self-attention in the final three bottleneck blocks of a ResNet and no other changes, the approach improves upon baselines significantly on instance segmentation and object detection while also reducing the parameters, with minimal overhead in latency.
Source: Bottleneck Transformers for Visual RecognitionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Instance Segmentation | 2 | 16.67% |
Classification | 1 | 8.33% |
Emotion Recognition | 1 | 8.33% |
Self-Supervised Learning | 1 | 8.33% |
Semantic Segmentation | 1 | 8.33% |
Autonomous Driving | 1 | 8.33% |
Scene Understanding | 1 | 8.33% |
Anatomy | 1 | 8.33% |
General Classification | 1 | 8.33% |
Component | Type |
|
---|---|---|
Bottleneck Transformer Block
|
Image Model Blocks | |
Convolution
|
Convolutions | |
Max Pooling
|
Pooling Operations |