TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Top 1 Accuracy	80.7%	# 629
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Number of params	5.4M	# 418
Image Classification	ImageNet	TinyViT-5M-distill (21k)	GFLOPs	1.3	# 118
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Top 1 Accuracy	83.2%	# 413
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Number of params	11M	# 484
Image Classification	ImageNet	TinyViT-11M-distill (21k)	GFLOPs	2.0	# 146
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Top 1 Accuracy	84.8%	# 270
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Number of params	21M	# 545
Image Classification	ImageNet	TinyViT-21M-distill (21k)	GFLOPs	4.3	# 202
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Top 1 Accuracy	86.2%	# 164
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Number of params	21M	# 545
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	GFLOPs	13.8	# 333
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Top 1 Accuracy	86.5%	# 135
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Number of params	21M	# 545
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	GFLOPs	27.0	# 386
Image Classification	ImageNet	TinyViT-5M	Top 1 Accuracy	79.1%	# 714
Image Classification	ImageNet	TinyViT-5M	Number of params	5.4M	# 418
Image Classification	ImageNet	TinyViT-5M	GFLOPs	1.3	# 118
Image Classification	ImageNet	TinyViT-11M	Top 1 Accuracy	81.5%	# 577
Image Classification	ImageNet	TinyViT-11M	Number of params	11M	# 484
Image Classification	ImageNet	TinyViT-11M	GFLOPs	2.0	# 146
Image Classification	ImageNet	TinyViT-21M	Top 1 Accuracy	83.1%	# 426
Image Classification	ImageNet	TinyViT-21M	Number of params	21M	# 545
Image Classification	ImageNet	TinyViT-21M	GFLOPs	4.3	# 202

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tinyvit-fast-pretraining-distillation-for/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=tinyvit-fast-pretraining-distillation-for)`

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

21 Jul 2022 · Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan ·

Vision transformer (ViT) recently has drawn great attention in computer vision due to its remarkable model capability. However, most prevailing ViT models suffer from huge number of parameters, restricting their applicability on devices with limited resources. To alleviate this issue, we propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. The central idea is to transfer knowledge from large pretrained models to small ones, while enabling small models to get the dividends of massive pretraining data. More specifically, we apply distillation during pretraining for knowledge transfer. The logits of large teacher models are sparsified and stored in disk in advance to save the memory cost and computation overheads. The tiny student transformers are automatically scaled down from a large pretrained model with computation and parameter constraints. Comprehensive experiments demonstrate the efficacy of TinyViT. It achieves a top-1 accuracy of 84.8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4.2 times fewer parameters. Moreover, increasing image resolutions, TinyViT can reach 86.5% accuracy, being slightly better than Swin-L while using only 11% parameters. Last but not the least, we demonstrate a good transfer ability of TinyViT on various downstream tasks. Code and models are available at https://github.com/microsoft/Cream/tree/main/TinyViT.

PDF Abstract

Code

Add Remove Mark official

microsoft/cream official

1,563

rwightman/pytorch-image-models

29,774

Tasks

Add Remove

Image Classification

Knowledge Distillation

Datasets

CIFAR-10

ImageNet

MS COCO

CIFAR-100

EuroSAT

Results from the Paper

Edit

Ranked #135 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	TinyViT-5M-distill (21k)	Top 1 Accuracy	80.7%	# 629	Compare
			Number of params	5.4M	# 418	Compare
			GFLOPs	1.3	# 118	Compare
Image Classification	ImageNet	TinyViT-11M-distill (21k)	Top 1 Accuracy	83.2%	# 413	Compare
			Number of params	11M	# 484	Compare
			GFLOPs	2.0	# 146	Compare
Image Classification	ImageNet	TinyViT-21M-distill (21k)	Top 1 Accuracy	84.8%	# 270	Compare
			Number of params	21M	# 545	Compare
			GFLOPs	4.3	# 202	Compare
Image Classification	ImageNet	TinyViT-21M-384-distill (384 res, 21k)	Top 1 Accuracy	86.2%	# 164	Compare
			Number of params	21M	# 545	Compare
			GFLOPs	13.8	# 333	Compare
Image Classification	ImageNet	TinyViT-21M-512-distill (512 res, 21k)	Top 1 Accuracy	86.5%	# 135	Compare
			Number of params	21M	# 545	Compare
			GFLOPs	27.0	# 386	Compare
Image Classification	ImageNet	TinyViT-5M	Top 1 Accuracy	79.1%	# 714	Compare
			Number of params	5.4M	# 418	Compare
			GFLOPs	1.3	# 118	Compare
Image Classification	ImageNet	TinyViT-11M	Top 1 Accuracy	81.5%	# 577	Compare
			Number of params	11M	# 484	Compare
			GFLOPs	2.0	# 146	Compare
Image Classification	ImageNet	TinyViT-21M	Top 1 Accuracy	83.1%	# 426	Compare
			Number of params	21M	# 545	Compare
			GFLOPs	4.3	# 202	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove