Self-Supervised Image Classification
85 papers with code • 2 benchmarks • 1 datasets
This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).
A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.
You may want to read some blog posts before reading the papers and checking the leaderboards:
- Contrastive Self-Supervised Learning - Ankesh Anand
- The Illustrated Self-Supervised Learning - Amit Chaudhary
- Self-supervised learning and computer vision - Jeremy Howard
- Self-Supervised Representation Learning - Lilian Weng
There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).
( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )
Libraries
Use these libraries to find Self-Supervised Image Classification models and implementationsLatest papers
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
The motivation behind MIM-Refiner is rooted in the insight that optimal representations within MIM models generally reside in intermediate layers.
Masked Image Residual Learning for Scaling Deeper Vision Transformers
With the same level of computational complexity as ViT-Base and ViT-Large, we instantiate 4. 5$\times$ and 2$\times$ deeper ViTs, dubbed ViT-S-54 and ViT-B-48.
Masking Augmentation for Supervised Learning
In this paper, we propose a novel way to involve masking augmentations dubbed Masked Sub-model (MaskSub).
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
In this work, we explore a scalable way for building a general representation model toward unlimited modalities.
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data.
DINOv2: Learning Robust Visual Features without Supervision
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Unicom: Universal and Compact Representation Learning for Image Retrieval
To further enhance the low-dimensional feature representation, we randomly select partial feature dimensions when calculating the similarities between embeddings and class-wise prototypes.
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Since the introduction of deep learning, a wide scope of representation properties, such as decorrelation, whitening, disentanglement, rank, isotropy, and mutual information, have been studied to improve the quality of representation.
All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy Reduction
Nearest neighbour based methods have proved to be one of the most successful self-supervised learning (SSL) approaches due to their high generalization capabilities.
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
This is the first use of sparse convolution for 2D masked modeling.