TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	FD-SwinV2-G	Validation mIoU	61.4	# 7
Semantic Segmentation	ADE20K val	FD-SwinV2-G	mIoU	61.4	# 4
Instance Segmentation	COCO test-dev	FD-SwinV2-G	mask AP	55.4	# 2
Object Detection	COCO test-dev	FD-SwinV2-G	box mAP	64.2	# 11
Image Classification	ImageNet	FD (CLIP ViT-L-336)	Top 1 Accuracy	89.0%	# 34
Image Classification	ImageNet	FD (CLIP ViT-L-336)	Number of params	307M	# 915

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-learning-rivals-masked-image/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=contrastive-learning-rivals-masked-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-learning-rivals-masked-image/semantic-segmentation-on-ade20k-val)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val?p=contrastive-learning-rivals-masked-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-learning-rivals-masked-image/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=contrastive-learning-rivals-masked-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-learning-rivals-masked-image/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=contrastive-learning-rivals-masked-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/contrastive-learning-rivals-masked-image/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=contrastive-learning-rivals-masked-image)`

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

27 May 2022 · Yixuan Wei, Han Hu, Zhenda Xie, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo ·

Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching 89.0% top-1 accuracy on ImageNet-1K classification. On the 3-billion-parameter SwinV2-G model, the fine-tuning accuracy is improved by +1.5 mIoU / +1.1 mAP to 61.4 mIoU / 64.2 mAP on ADE20K semantic segmentation and COCO object detection, respectively, creating new records on both benchmarks. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.

PDF Abstract

Code

Add Remove Mark official

SwinTransformer/Feature-Distillation official

219

Tasks

Add Remove

Contrastive Learning

Image Classification

Instance Segmentation

object-detection

Object Detection

Self-Supervised Learning

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K

Results from the Paper

Edit

Ranked #2 on Instance Segmentation on COCO test-dev (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	FD-SwinV2-G	Validation mIoU	61.4	# 7	Compare
Semantic Segmentation	ADE20K val	FD-SwinV2-G	mIoU	61.4	# 4	Compare
Instance Segmentation	COCO test-dev	FD-SwinV2-G	mask AP	55.4	# 2	Compare
Object Detection	COCO test-dev	FD-SwinV2-G	box mAP	64.2	# 11	Compare
Image Classification	ImageNet	FD (CLIP ViT-L-336)	Top 1 Accuracy	89.0%	# 34	Compare
Image Classification	ImageNet	FD (CLIP ViT-L-336)	Number of params	307M	# 915	Compare

Methods

Add Remove

CLIP • MIM

Edit Social Preview

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove