Self-Supervised Image Classification
85 papers with code • 2 benchmarks • 1 datasets
This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).
A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.
You may want to read some blog posts before reading the papers and checking the leaderboards:
- Contrastive Self-Supervised Learning - Ankesh Anand
- The Illustrated Self-Supervised Learning - Amit Chaudhary
- Self-supervised learning and computer vision - Jeremy Howard
- Self-Supervised Representation Learning - Lilian Weng
There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).
( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )
Libraries
Use these libraries to find Self-Supervised Image Classification models and implementationsMost implemented papers
Revisiting Self-Supervised Visual Representation Learning
Unsupervised visual representation learning remains a largely unsolved problem in computer vision research.
Self-Supervised Learning with Swin Transformers
We are witnessing a modeling shift from CNN to Transformers in computer vision.
VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image.
Context Autoencoder for Self-Supervised Representation Learning
The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
Masked Feature Prediction for Self-Supervised Visual Pre-Training
We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.
Unsupervised Feature Learning via Non-Parametric Instance Discrimination
Neural net classifiers trained on data with annotated class labels can also capture apparent visual similarity among categories without being directed to do so.
Data-Efficient Image Recognition with Contrastive Predictive Coding
Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge.
Large Scale Adversarial Representation Learning
We extensively evaluate the representation learning and generation capabilities of these BigBiGAN models, demonstrating that these generation-based models achieve the state of the art in unsupervised representation learning on ImageNet, as well as in unconditional image generation.
Self-labelling via simultaneous clustering and representation learning
Combining clustering and representation learning is one of the most promising approaches for unsupervised learning of deep neural networks.