Efficient ViTs

26 papers with code • 3 benchmarks • 0 datasets

Increasing the efficiency of ViTs without the modification of the architecture. (i.e., Key & Query Sparsification, Token pruning & merging)

Benchmarks

Add a Result

These leaderboards are used to track progress in Efficient ViTs

Dataset	Best Model	Compare
ImageNet-1K (with DeiT-S)	MCTF ($r=16$)	See all
ImageNet-1K (with DeiT-T)	dTPS	See all
ImageNet-1K (With LV-ViT-S)	MCTF ($r=8$)	See all

Most implemented papers

Most implemented Social Latest No code

Global Vision Transformer Pruning with Hessian-Aware Saliency

NVlabs/NViT • • CVPR 2023

This work aims on challenging the common design philosophy of the Vision Transformer (ViT) model with uniform dimension across all the stacked blocks in a model stage, where we redistribute the parameters both across transformer blocks and between different structures within the block via the first systematic attempt on global structural pruning.

Paper
Code

Adaptive Token Sampling For Efficient Vision Transformers

adaptivetokensampling/ATS • • 30 Nov 2021

Since ATS is a parameter-free module, it can be added to the off-the-shelf pre-trained vision transformers as a plug and play module, thus reducing their GFLOPs without any additional training.

Paper
Code

AdaViT: Adaptive Tokens for Efficient Vision Transformer

NVlabs/A-ViT • • CVPR 2022

A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.

Paper
Code

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

peiyanflying/spvit • • 27 Dec 2021

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Paper
Code

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

youweiliang/evit • • 16 Feb 2022

Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images.

Paper
Code

Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention

cydia2018/as-vit • • 28 Sep 2022

The learnable thresholds are optimized in budget-aware training to balance accuracy and complexity, performing the corresponding pruning configurations for different input instances.

Paper
Code

Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference

gatech-eic/castling-vit • • CVPR 2023

Vision Transformers (ViTs) have shown impressive performance but still require a high computation cost as compared to convolutional neural networks (CNNs), one reason is that ViTs' attention measures global similarities and thus has a quadratic complexity with the number of input tokens.

Paper
Code

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

BWLONG/BeyondAttentiveTokens • CVPR 2023

In this paper, we emphasize the cruciality of diverse global semantics and propose an efficient token decoupling and merging method that can jointly consider the token importance and diversity for token pruning.

Paper
Code

Making Vision Transformers Efficient from A Token Sparsification View

changsn/STViT-R • • CVPR 2023

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

Paper
Code

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

megvii-research/tps-cvpr2023 • • CVPR 2023

Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy.

Paper
Code

Efficient ViTs

Benchmarks Add a Result

Most implemented papers

Content

Benchmarks

Add a Result