Self-Supervised Image Classification
85 papers with code • 2 benchmarks • 1 datasets
This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).
A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.
You may want to read some blog posts before reading the papers and checking the leaderboards:
- Contrastive Self-Supervised Learning - Ankesh Anand
- The Illustrated Self-Supervised Learning - Amit Chaudhary
- Self-supervised learning and computer vision - Jeremy Howard
- Self-Supervised Representation Learning - Lilian Weng
There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).
( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )
Libraries
Use these libraries to find Self-Supervised Image Classification models and implementationsLatest papers
Masked Siamese Networks for Label-Efficient Learning
We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations.
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training
Image BERT pre-training with masked image modeling (MIM) becomes a popular practice to cope with self-supervised representation learning.
Mugs: A Multi-Granular Self-Supervised Learning Framework
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive Learning
It trains an encoder by distinguishing positive samples from negative ones given query anchors.
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images.
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.
Context Autoencoder for Self-Supervised Representation Learning
The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches.
When Do Flat Minima Optimizers Work?
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers.
Max-Margin Contrastive Learning
Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence.
Masked Feature Prediction for Self-Supervised Visual Pre-Training
We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.