TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Efficient ViTs	ImageNet-1K (with DeiT-S)	DiffRate	Top 1 Accuracy	79.8	# 4
Efficient ViTs	ImageNet-1K (with DeiT-S)	DiffRate	GFLOPs	2.9	# 19
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	DiffRate	Top 1 Accuracy	82.6	# 13
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	DiffRate	GFLOPs	3.9	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffrate-differentiable-compression-rate-for/efficient-vits-on-imagenet-1k-with-deit-s)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-deit-s?p=diffrate-differentiable-compression-rate-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/diffrate-differentiable-compression-rate-for/efficient-vits-on-imagenet-1k-with-lv-vit-s)](https://paperswithcode.com/sota/efficient-vits-on-imagenet-1k-with-lv-vit-s?p=diffrate-differentiable-compression-rate-for)`

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

ICCV 2023 · Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo ·

Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. It is an important but challenging task. Although recent advanced approaches achieved great success, they need to carefully handcraft a compression rate (i.e. number of tokens to remove), which is tedious and leads to sub-optimal performance. To tackle this problem, we propose Differentiable Compression Rate (DiffRate), a novel token compression method that has several appealing properties prior arts do not have. First, DiffRate enables propagating the loss function's gradient onto the compression ratio, which is considered as a non-differentiable hyperparameter in previous work. In this case, different layers can automatically learn different compression rates layer-wisely without extra overhead. Second, token pruning and merging can be naturally performed simultaneously in DiffRate, while they were isolated in previous works. Third, extensive experiments demonstrate that DiffRate achieves state-of-the-art performance. For example, by applying the learned layer-wise compression rates to an off-the-shelf ViT-H (MAE) model, we achieve a 40% FLOPs reduction and a 1.5x throughput improvement, with a minor accuracy drop of 0.16% on ImageNet without fine-tuning, even outperforming previous methods with fine-tuning. Codes and models are available at https://github.com/OpenGVLab/DiffRate.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

opengvlab/diffrate official

Tasks

Add Remove

Efficient ViTs

Datasets

ImageNet

Results from the Paper

Edit

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Efficient ViTs	ImageNet-1K (with DeiT-S)	DiffRate	Top 1 Accuracy	79.8	# 4	Compare
Efficient ViTs	ImageNet-1K (with DeiT-S)	DiffRate	GFLOPs	2.9	# 19	Compare
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	DiffRate	Top 1 Accuracy	82.6	# 13	Compare
Efficient ViTs	ImageNet-1K (With LV-ViT-S)	DiffRate	GFLOPs	3.9	# 14	Compare

Methods

Add Remove

Pruning • SPEED

Edit Social Preview

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove