TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	V-MoE-H/14 (Every-2)	Top 1 Accuracy	88.36%	# 59
Image Classification	ImageNet	V-MoE-H/14 (Every-2)	Number of params	7200M	# 978
Image Classification	ImageNet	V-MoE-H/14 (Last-5)	Top 1 Accuracy	88.23%	# 65
Image Classification	ImageNet	V-MoE-H/14 (Last-5)	Number of params	2700M	# 972
Image Classification	ImageNet	V-MoE-L/16 (Every-2)	Top 1 Accuracy	87.41%	# 92
Image Classification	ImageNet	V-MoE-L/16 (Every-2)	Number of params	3400M	# 975
Image Classification	ImageNet	VIT-H/14	Top 1 Accuracy	88.08%	# 68
Image Classification	ImageNet	VIT-H/14	Number of params	656M	# 946
Few-Shot Image Classification	ImageNet - 10-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	80.1	# 6
Few-Shot Image Classification	ImageNet - 10-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	84.29	# 2
Few-Shot Image Classification	ImageNet - 10-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	80.33	# 5
Few-Shot Image Classification	ImageNet - 10-shot	VIT-H/14	Top 1 Accuracy	79.01	# 7
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	62.95	# 4
Few-Shot Image Classification	ImageNet - 1-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	68.66	# 1
Few-Shot Image Classification	ImageNet - 1-shot	VIT-H/14	Top 1 Accuracy	62.34	# 6
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-L/16 (Every-2)	Top 1 Accuracy	62.41	# 5
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	63.38	# 3
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-L/16 (Every-2)	Top 1 Accuracy	77.1	# 7
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	78.21	# 5
Few-Shot Image Classification	ImageNet - 5-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	82.78	# 1
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	78.08	# 6
Few-Shot Image Classification	ImageNet - 5-shot	VIT-H/14	Top 1 Accuracy	76.95	# 8
Image Classification	JFT-300M	VIT-H/14	prec@1	56.68	# 4
Image Classification	JFT-300M	V-MoE-H/14 (Last-5)	prec@1	60.12	# 2
Image Classification	JFT-300M	V-MoE-H/14 (Every-2)	prec@1	60.62	# 1
Image Classification	JFT-300M	V-MoE-L/16 (Every-2)	prec@1	57.65	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-vision-with-sparse-mixture-of-experts/few-shot-image-classification-on-imagenet-1-1)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-1-1?p=scaling-vision-with-sparse-mixture-of-experts)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-vision-with-sparse-mixture-of-experts/few-shot-image-classification-on-imagenet-5)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-5?p=scaling-vision-with-sparse-mixture-of-experts)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-vision-with-sparse-mixture-of-experts/image-classification-on-jft-300m)](https://paperswithcode.com/sota/image-classification-on-jft-300m?p=scaling-vision-with-sparse-mixture-of-experts)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-vision-with-sparse-mixture-of-experts/few-shot-image-classification-on-imagenet-10)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-10?p=scaling-vision-with-sparse-mixture-of-experts)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/scaling-vision-with-sparse-mixture-of-experts/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=scaling-vision-with-sparse-mixture-of-experts)`

Scaling Vision with Sparse Mixture of Experts

NeurIPS 2021 · Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, André Susano Pinto, Daniel Keysers, Neil Houlsby ·

Sparsely-gated Mixture of Experts networks (MoEs) have demonstrated excellent scalability in Natural Language Processing. In Computer Vision, however, almost all performant networks are "dense", that is, every input is processed by every parameter. We present a Vision MoE (V-MoE), a sparse version of the Vision Transformer, that is scalable and competitive with the largest dense networks. When applied to image recognition, V-MoE matches the performance of state-of-the-art networks, while requiring as little as half of the compute at inference time. Further, we propose an extension to the routing algorithm that can prioritize subsets of each input across the entire batch, leading to adaptive per-image compute. This allows V-MoE to trade-off performance and compute smoothly at test-time. Finally, we demonstrate the potential of V-MoE to scale vision models, and train a 15B parameter model that attains 90.35% on ImageNet.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

google-research/vmoe official

513

Tasks

Add Remove

Few-Shot Image Classification

Image Classification

Datasets

ImageNet

JFT-300M

Results from the Paper

Edit

Ranked #1 on Few-Shot Image Classification on ImageNet - 5-shot

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	V-MoE-H/14 (Every-2)	Top 1 Accuracy	88.36%	# 59	Compare
Image Classification	ImageNet	V-MoE-H/14 (Every-2)	Number of params	7200M	# 978	Compare
Image Classification	ImageNet	V-MoE-H/14 (Last-5)	Top 1 Accuracy	88.23%	# 65	Compare
Image Classification	ImageNet	V-MoE-H/14 (Last-5)	Number of params	2700M	# 972	Compare
Image Classification	ImageNet	V-MoE-L/16 (Every-2)	Top 1 Accuracy	87.41%	# 92	Compare
Image Classification	ImageNet	V-MoE-L/16 (Every-2)	Number of params	3400M	# 975	Compare
Image Classification	ImageNet	VIT-H/14	Top 1 Accuracy	88.08%	# 68	Compare
Image Classification	ImageNet	VIT-H/14	Number of params	656M	# 946	Compare
Few-Shot Image Classification	ImageNet - 10-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	80.1	# 6	Compare
Few-Shot Image Classification	ImageNet - 10-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	84.29	# 2	Compare
Few-Shot Image Classification	ImageNet - 10-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	80.33	# 5	Compare
Few-Shot Image Classification	ImageNet - 10-shot	VIT-H/14	Top 1 Accuracy	79.01	# 7	Compare
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	62.95	# 4	Compare
Few-Shot Image Classification	ImageNet - 1-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	68.66	# 1	Compare
Few-Shot Image Classification	ImageNet - 1-shot	VIT-H/14	Top 1 Accuracy	62.34	# 6	Compare
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-L/16 (Every-2)	Top 1 Accuracy	62.41	# 5	Compare
Few-Shot Image Classification	ImageNet - 1-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	63.38	# 3	Compare
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-L/16 (Every-2)	Top 1 Accuracy	77.1	# 7	Compare
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-H/14 (Every-2)	Top 1 Accuracy	78.21	# 5	Compare
Few-Shot Image Classification	ImageNet - 5-shot	ViT-MoE-15B (Every-2)	Top 1 Accuracy	82.78	# 1	Compare
Few-Shot Image Classification	ImageNet - 5-shot	V-MoE-H/14 (Last-5)	Top 1 Accuracy	78.08	# 6	Compare
Few-Shot Image Classification	ImageNet - 5-shot	VIT-H/14	Top 1 Accuracy	76.95	# 8	Compare
Image Classification	JFT-300M	VIT-H/14	prec@1	56.68	# 4	Compare
Image Classification	JFT-300M	V-MoE-H/14 (Last-5)	prec@1	60.12	# 2	Compare
Image Classification	JFT-300M	V-MoE-H/14 (Every-2)	prec@1	60.62	# 1	Compare
Image Classification	JFT-300M	V-MoE-L/16 (Every-2)	prec@1	57.65	# 3	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

Scaling Vision with Sparse Mixture of Experts

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove