TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	CIFAR-10	CaiT-M-36 U 224	Percentage correct	99.4	# 5
Image Classification	CIFAR-100	CaiT-M-36 U 224	Percentage correct	93.1	# 11
Image Classification	Flowers-102	CaiT-M-36 U 224	Accuracy	99.1	# 13
Image Classification	ImageNet	CAIT-XXS-36	Top 1 Accuracy	82.2%	# 511
Image Classification	ImageNet	CAIT-XXS-36	Number of params	17.3M	# 523
Image Classification	ImageNet	CAIT-XXS-36	GFLOPs	14.3	# 335
Image Classification	ImageNet	CAIT-XS-24	Top 1 Accuracy	84.1%	# 326
Image Classification	ImageNet	CAIT-XS-24	Number of params	26.6M	# 614
Image Classification	ImageNet	CAIT-XS-24	GFLOPs	19.3	# 364
Image Classification	ImageNet	CAIT-XS-36	Top 1 Accuracy	84.8%	# 271
Image Classification	ImageNet	CAIT-XS-36	Number of params	38.6M	# 666
Image Classification	ImageNet	CAIT-XS-36	GFLOPs	28.8	# 389
Image Classification	ImageNet	CAIT-S-24	Top 1 Accuracy	85.1%	# 246
Image Classification	ImageNet	CAIT-S-24	Number of params	46.9M	# 711
Image Classification	ImageNet	CAIT-S-24	GFLOPs	32.2	# 398
Image Classification	ImageNet	CAIT-S-36	Top 1 Accuracy	85.4%	# 222
Image Classification	ImageNet	CAIT-S-36	Number of params	68.2M	# 787
Image Classification	ImageNet	CAIT-S-36	GFLOPs	48	# 421
Image Classification	ImageNet	CAIT-M-24	Top 1 Accuracy	85.8%	# 188
Image Classification	ImageNet	CAIT-M-24	Number of params	185.9M	# 888
Image Classification	ImageNet	CAIT-M-24	GFLOPs	116.1	# 458
Image Classification	ImageNet	CAIT-M-36	Top 1 Accuracy	86.1%	# 171
Image Classification	ImageNet	CAIT-M-36	Number of params	270.9M	# 909
Image Classification	ImageNet	CAIT-M-36	GFLOPs	173.3	# 464
Image Classification	ImageNet	CAIT-M36-448	Top 1 Accuracy	86.3%	# 154
Image Classification	ImageNet	CAIT-M36-448	Number of params	271M	# 910
Image Classification	ImageNet	CAIT-M36-448	GFLOPs	247.8	# 472
Image Classification	ImageNet	CAIT-S-48	Top 1 Accuracy	85.3%	# 232
Image Classification	ImageNet	CAIT-S-48	Number of params	89.5M	# 846
Image Classification	ImageNet	CAIT-S-48	GFLOPs	63.8	# 435
Image Classification	ImageNet	CAIT-XXS-24	Top 1 Accuracy	80.9%	# 619
Image Classification	ImageNet	CAIT-XXS-24	Number of params	12M	# 497
Image Classification	ImageNet	CAIT-XXS-24	GFLOPs	9.6	# 292
Image Classification	ImageNet	CaiT-M-48-448	Top 1 Accuracy	86.5%	# 136
Image Classification	ImageNet	CaiT-M-48-448	Number of params	438M	# 931
Image Classification	ImageNet	CaiT-M-48-448	GFLOPs	377.3	# 480
Image Classification	ImageNet ReaL	CAIT-M36-448	Accuracy	90.2%	# 19
Image Classification	ImageNet V2	CAIT-M36-448	Top 1 Accuracy	76.7	# 16
Image Classification	iNaturalist 2018	CaiT-M-36 U 224	Top-1 Accuracy	78%	# 18
Image Classification	iNaturalist 2019	CaiT-M-36 U 224	Top-1 Accuracy	81.8	# 7
Image Classification	Stanford Cars	CaiT-M-36 U 224	Accuracy	94.2	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-cifar-10)](https://paperswithcode.com/sota/image-classification-on-cifar-10?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-stanford-cars)](https://paperswithcode.com/sota/image-classification-on-stanford-cars?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-inaturalist-2019)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2019?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-cifar-100)](https://paperswithcode.com/sota/image-classification-on-cifar-100?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-flowers-102)](https://paperswithcode.com/sota/image-classification-on-flowers-102?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-imagenet-v2)](https://paperswithcode.com/sota/image-classification-on-imagenet-v2?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-inaturalist-2018)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2018?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-imagenet-real)](https://paperswithcode.com/sota/image-classification-on-imagenet-real?p=going-deeper-with-image-transformers)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/going-deeper-with-image-transformers/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=going-deeper-with-image-transformers)`

Going deeper with Image Transformers

ICCV 2021 · Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou ·

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and optimization of such dedicated transformers. We make two transformers architecture changes that significantly improve the accuracy of deep transformers. This leads us to produce models whose performance does not saturate early with more depth, for instance we obtain 86.5% top-1 accuracy on Imagenet when training with no external data, we thus attain the current SOTA with less FLOPs and parameters. Moreover, our best model establishes the new state of the art on Imagenet with Reassessed labels and Imagenet-V2 / match frequency, in the setting with no additional training data. We share our code and models.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Code

Add Remove Mark official

rwightman/pytorch-image-models official

30,139

facebookresearch/deit official

3,905

lucidrains/vit-pytorch

18,324

BR-IDL/PaddleViT

1,190

martinsbruveris/tensorflow-image-mo…

282

See all 19 implementations

Tasks

Add Remove

Image Classification

Transfer Learning

Datasets

CIFAR-10

ImageNet

CIFAR-100

Oxford 102 Flower ImageNet-1K

Stanford Cars

iNaturalist Imagenet ReaL

Results from the Paper

Edit

Ranked #5 on Image Classification on CIFAR-10 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	CIFAR-10	CaiT-M-36 U 224	Percentage correct	99.4	# 5	Compare
Image Classification	CIFAR-100	CaiT-M-36 U 224	Percentage correct	93.1	# 11	Compare
Image Classification	Flowers-102	CaiT-M-36 U 224	Accuracy	99.1	# 13	Compare
Image Classification	ImageNet	CAIT-XXS-36	Top 1 Accuracy	82.2%	# 511	Compare
			Number of params	17.3M	# 523	Compare
			GFLOPs	14.3	# 335	Compare
Image Classification	ImageNet	CAIT-XS-24	Top 1 Accuracy	84.1%	# 326	Compare
			Number of params	26.6M	# 614	Compare
			GFLOPs	19.3	# 364	Compare
Image Classification	ImageNet	CAIT-XS-36	Top 1 Accuracy	84.8%	# 271	Compare
			Number of params	38.6M	# 666	Compare
			GFLOPs	28.8	# 389	Compare
Image Classification	ImageNet	CAIT-S-24	Top 1 Accuracy	85.1%	# 246	Compare
			Number of params	46.9M	# 711	Compare
			GFLOPs	32.2	# 398	Compare
Image Classification	ImageNet	CAIT-S-36	Top 1 Accuracy	85.4%	# 222	Compare
			Number of params	68.2M	# 787	Compare
			GFLOPs	48	# 421	Compare
Image Classification	ImageNet	CAIT-M-24	Top 1 Accuracy	85.8%	# 188	Compare
			Number of params	185.9M	# 888	Compare
			GFLOPs	116.1	# 458	Compare
Image Classification	ImageNet	CAIT-M-36	Top 1 Accuracy	86.1%	# 171	Compare
			Number of params	270.9M	# 909	Compare
			GFLOPs	173.3	# 464	Compare
Image Classification	ImageNet	CAIT-M36-448	Top 1 Accuracy	86.3%	# 154	Compare
			Number of params	271M	# 910	Compare
			GFLOPs	247.8	# 472	Compare
Image Classification	ImageNet	CAIT-S-48	Top 1 Accuracy	85.3%	# 232	Compare
			Number of params	89.5M	# 846	Compare
			GFLOPs	63.8	# 435	Compare
Image Classification	ImageNet	CAIT-XXS-24	Top 1 Accuracy	80.9%	# 619	Compare
			Number of params	12M	# 497	Compare
			GFLOPs	9.6	# 292	Compare
Image Classification	ImageNet	CaiT-M-48-448	Top 1 Accuracy	86.5%	# 136	Compare
			Number of params	438M	# 931	Compare
			GFLOPs	377.3	# 480	Compare
Image Classification	ImageNet ReaL	CAIT-M36-448	Accuracy	90.2%	# 19	Compare
Image Classification	ImageNet V2	CAIT-M36-448	Top 1 Accuracy	76.7	# 16	Compare
Image Classification	iNaturalist 2018	CaiT-M-36 U 224	Top-1 Accuracy	78%	# 18	Compare
Image Classification	iNaturalist 2019	CaiT-M-36 U 224	Top-1 Accuracy	81.8	# 7	Compare
Image Classification	Stanford Cars	CaiT-M-36 U 224	Accuracy	94.2	# 5	Compare

Methods

Add Remove

CaiT • Class Attention • DeiT • Dense Connections • Feedforward Network • Layer Normalization • LayerScale • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax

Edit Social Preview

Going deeper with Image Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove