TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Efficient ViTs	ImageNet-1K (with DeiT-S)	PPT	Top 1 Accuracy	79.8	# 4
Efficient ViTs	ImageNet-1K (with DeiT-S)	PPT	GFLOPs	2.9	# 19
Efficient ViTs	ImageNet-1K (with DeiT-T)	PPT	Top 1 Accuracy	72.1	# 8
Efficient ViTs	ImageNet-1K (with DeiT-T)	PPT	GFLOPs	0.8	# 8
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	PPT	Top 1 Accuracy	83.1	# 6
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	PPT	GFLOPs	4.6	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ppt-token-pruning-and-pooling-for-efficient/efficient-vits-on-imagenet-1k-with-deit-s)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-deit-s?p=ppt-token-pruning-and-pooling-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ppt-token-pruning-and-pooling-for-efficient/efficient-vits-on-imagenet-1k-with-lv-vit-s)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-lv-vit-s?p=ppt-token-pruning-and-pooling-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ppt-token-pruning-and-pooling-for-efficient/efficient-vits-on-imagenet-1k-with-deit-t)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-deit-t?p=ppt-token-pruning-and-pooling-for-efficient)`

PPT: Token Pruning and Pooling for Efficient Vision Transformers

3 Oct 2023 · Xinjian Wu, Fanhu Zeng, Xiudong Wang, Xinghao Chen ·

Vision Transformers (ViTs) have emerged as powerful models in the field of computer vision, delivering superior performance across various vision tasks. However, the high computational complexity poses a significant barrier to their practical applications in real-world scenarios. Motivated by the fact that not all tokens contribute equally to the final predictions and fewer tokens bring less computational cost, reducing redundant tokens has become a prevailing paradigm for accelerating vision transformers. However, we argue that it is not optimal to either only reduce inattentive redundancy by token pruning, or only reduce duplicative redundancy by token merging. To this end, in this paper we propose a novel acceleration framework, namely token Pruning & Pooling Transformers (PPT), to adaptively tackle these two types of redundancy in different layers. By heuristically integrating both token pruning and token pooling techniques in ViTs without additional trainable parameters, PPT effectively reduces the model complexity while maintaining its predictive accuracy. For example, PPT reduces over 37% FLOPs and improves the throughput by over 45% for DeiT-S without any accuracy drop on the ImageNet dataset. The code is available at https://github.com/xjwu1024/PPT and https://github.com/mindspore-lab/models/

PDF Abstract

Code

Add Remove Mark official

mindspore-lab/models official

xjwu1024/PPT official

Tasks

Add Remove

Efficient ViTs

Datasets

ImageNet

Results from the Paper

Edit

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Efficient ViTs	ImageNet-1K (with DeiT-S)	PPT	Top 1 Accuracy	79.8	# 4	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	PPT	GFLOPs	2.9	# 19	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	PPT	Top 1 Accuracy	72.1	# 8	Compare
Efficient ViTs	ImageNet-1K (with DeiT-T)	PPT	GFLOPs	0.8	# 8	Compare
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	PPT	Top 1 Accuracy	83.1	# 6	Compare
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	PPT	GFLOPs	4.6	# 8	Compare

Methods

Add Remove

Pruning

Edit Social Preview

PPT: Token Pruning and Pooling for Efficient Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove