Gated Positional Self-Attention (GPSA) is a self-attention module for vision transformers, used in the ConViT architecture, that can be initialized as a convolutional layer -- helping a ViT learn inductive biases about locality.
Source: ConViT: Improving Vision Transformers with Soft Convolutional Inductive BiasesPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 2 | 50.00% |
Language Modelling | 1 | 25.00% |
Fine-Grained Image Classification | 1 | 25.00% |
Component | Type |
|
---|---|---|
Dropout
|
Regularization | |
Layer Normalization
|
Normalization | |
Scaled Dot-Product Attention
|
Attention Mechanisms | |
Softmax
|
Output Functions |