With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations

Self-supervised learning algorithms based on instance discrimination train encoders to be invariant to pre-defined transformations of the same instance. While most methods treat different views of the same image as positives for a contrastive loss, we are interested in using positives from other instances in the dataset. Our method, Nearest-Neighbor Contrastive Learning of visual Representations (NNCLR), samples the nearest neighbors from the dataset in the latent space, and treats them as positives. This provides more semantic variations than pre-defined transformations. We find that using the nearest-neighbor as positive in contrastive losses improves performance significantly on ImageNet classification, from 71.7% to 75.6%, outperforming previous state-of-the-art methods. On semi-supervised learning benchmarks we improve performance significantly when only 1% ImageNet labels are available, from 53.8% to 56.5%. On transfer learning benchmarks our method outperforms state-of-the-art methods (including supervised learning with ImageNet) on 8 out of 12 downstream datasets. Furthermore, we demonstrate empirically that our method is less reliant on complex data augmentations. We see a relative reduction of only 2.1% ImageNet Top-1 accuracy when we train using only random crops.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Fine-Grained Image Classification Birdsnap NNCLR Accuracy 61.4% # 5
Fine-Grained Image Classification Caltech-101 NNCLR Top-1 Error Rate 8.7% # 9
Image Classification CIFAR-10 NNCLR Percentage correct 93.7 # 154
Image Classification CIFAR-100 NNCLR Percentage correct 79 # 130
Image Classification DTD NNCLR Accuracy 75.5 # 9
Fine-Grained Image Classification FGVC Aircraft NNCLR Accuracy 64.1% # 49
Image Classification Flowers-102 NNCLR Accuracy 95.1 # 42
Image Classification Food-101 NNCLR Accuracy (%) 76.7 # 5
Self-Supervised Image Classification ImageNet NNCLR (ResNet-50, multi-crop) Top 1 Accuracy 75.6% # 64
Top 5 Accuracy 92.4 # 12
Semi-Supervised Image Classification ImageNet - 10% labeled data NNCLR (ResNet-50) Top 5 Accuracy 89.3% # 25
Top 1 Accuracy 69.8% # 33
Semi-Supervised Image Classification ImageNet - 1% labeled data NNCLR (ResNet-50) Top 5 Accuracy 80.7% # 22
Top 1 Accuracy 56.4% # 37
Image Classification Oxford-IIIT Pet Dataset NNCLR Accuracy 91.8 # 3
Image Classification PASCAL VOC 2007 NNCLR Accuracy 83 # 1
Image Classification Stanford Cars NNCLR Accuracy 67.1 # 21
Fine-Grained Image Classification SUN397 NNCLR Accuracy 62.5 # 5

Methods