Vision Transformers

LeVIT is a hybrid neural network for fast inference image classification. LeViT is a stack of transformer blocks, with pooling steps to reduce the resolution of the activation maps as in classical convolutional architectures. This replaces the uniform structure of a Transformer by a pyramid with pooling, similar to the LeNet architecture

Source: LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Anomaly Detection 1 33.33%
General Classification 1 33.33%
Image Classification 1 33.33%

Categories