Self-Supervised Image Classification

85 papers with code • 2 benchmarks • 1 datasets

This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).

A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.

You may want to read some blog posts before reading the papers and checking the leaderboards:

Contrastive Self-Supervised Learning - Ankesh Anand
The Illustrated Self-Supervised Learning - Amit Chaudhary
Self-supervised learning and computer vision - Jeremy Howard
Self-Supervised Representation Learning - Lilian Weng

There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).

( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )

Benchmarks

Add a Result

These leaderboards are used to track progress in Self-Supervised Image Classification

Trend	Dataset	Best Model	Paper	Code	Compare
	ImageNet	DINOv2 (ViT-g/14 @448)			See all
	ImageNet (finetuned)	DINOv2 (ViT-g/14, 448)			See all

Libraries

Use these libraries to find Self-Supervised Image Classification models and implementations

lightly-ai/lightly

13 papers

2,741

Westlake-AI/openmixup

13 papers

567

open-mmlab/mmselfsup

12 papers

3,078

facebookresearch/vissl

11 papers

3,227

See all 18 libraries.

Datasets

ImageNet

Most implemented papers

Most implemented Social Latest No code

Unsupervised Representation Learning by Predicting Image Rotations

gidariss/FeatureLearningRotNet • • ICLR 2018

However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale.

Paper
Code

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments

facebookresearch/swav • • NeurIPS 2020

In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.

Paper
Code

ResMLP: Feedforward networks for image classification with data-efficient training

facebookresearch/deit • • NeurIPS 2021

We present ResMLP, an architecture built entirely upon multi-layer perceptrons for image classification.

Paper
Code

BEiT: BERT Pre-Training of Image Transformers

microsoft/unilm • • ICLR 2022

We first "tokenize" the original image into visual tokens.

Paper
Code

XCiT: Cross-Covariance Image Transformers

rwightman/pytorch-image-models • • NeurIPS 2021

We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

Paper
Code

DINOv2: Learning Robust Visual Features without Supervision

facebookresearch/dinov2 • • 14 Apr 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Paper
Code

Contrastive Multiview Coding

HobbitLong/CMC • • ECCV 2020

We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.

Paper
Code

Big Self-Supervised Models are Strong Semi-Supervised Learners

google-research/simclr • • NeurIPS 2020

The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge.

Paper
Code

An Empirical Study of Training Self-Supervised Vision Transformers

facebookresearch/moco-v3 • • ICCV 2021

In this work, we go back to basics and investigate the effects of several fundamental components for training self-supervised ViT.

Paper
Code

Self-Supervised Learning of Pretext-Invariant Representations

facebookresearch/vissl • • CVPR 2020

The goal of self-supervised learning from images is to construct image representations that are semantically meaningful via pretext tasks that do not require semantic annotations for a large training set of images.

Paper
Code

Self-Supervised Image Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result