Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-50 x2) Top 1 Accuracy 75.6% # 66
Top 5 Accuracy 92.7% # 10
Number of Params 94M # 29
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-152 x3, SK) Top 1 Accuracy 79.8% # 29
Top 5 Accuracy 94.9% # 2
Number of Params 795M # 5
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-50) Top 1 Accuracy 71.7% # 92
Top 5 Accuracy 90.4% # 22
Number of Params 24M # 48
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-152 x3, SK) Top 5 Accuracy 95.0% # 3
Top 1 Accuracy 80.1% # 7
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 distilled (ResNet-50 x2, SK) Top 5 Accuracy 95.0% # 3
Top 1 Accuracy 80.2% # 6
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-50 x2) Top 5 Accuracy 91.9% # 9
Top 1 Accuracy 73.9% # 26
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-50) Top 5 Accuracy 89.2% # 27
Top 1 Accuracy 68.4% # 36
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 distilled (ResNet-50) Top 5 Accuracy 93.4% # 5
Top 1 Accuracy 77.5% # 13
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 self-distilled (ResNet-152 x3, SK) Top 5 Accuracy 95.5% # 2
Top 1 Accuracy 80.9% # 5
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 distilled (ResNet-50) Top 5 Accuracy 91.5% # 5
Top 1 Accuracy 73.9% # 9
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 distilled (ResNet-50 x2, SK) Top 5 Accuracy 93.0% # 3
Top 1 Accuracy 75.9% # 6
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 self-distilled (ResNet-152 x3, SK) Top 5 Accuracy 93.4% # 1
Top 1 Accuracy 76.6% # 5
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-50) Top 5 Accuracy 82.5% # 19
Top 1 Accuracy 57.9% # 35
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-152 x3, SK) Top 5 Accuracy 92.3% # 4
Top 1 Accuracy 74.9% # 8
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-50 ×2) Top 5 Accuracy 87.4% # 10
Top 1 Accuracy 66.3% # 23
Self-Supervised Image Classification ImageNet (finetuned) SimCLRv2 (ResNet-152, 3×+SK) Number of Params 795M # 4
Top 1 Accuracy 83.1% # 46

Methods