Exploring the Limits of Deep Image Clustering using Pretrained Models

31 Mar 2023  ยท  Nikolas Adaloglou, Felix Michels, Hamza Kalisch, Markus Kollmann ยท

We present a general methodology that learns to classify images without labels by leveraging pretrained feature extractors. Our approach involves self-distillation training of clustering heads based on the fact that nearest neighbours in the pretrained feature space are likely to share the same label. We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting. We demonstrate that the proposed objective is able to attenuate the effect of false positive pairs while efficiently exploiting the structure in the pretrained feature space. As a result, we improve the clustering accuracy over $k$-means on $17$ different pretrained models by $6.1$\% and $12.2$\% on ImageNet and CIFAR100, respectively. Finally, using self-supervised vision transformers, we achieve a clustering accuracy of $61.6$\% on ImageNet. The code is available at https://github.com/HHU-MMBS/TEMI-official-BMVC2023.

PDF Abstract

Results from the Paper


 Ranked #1 on Image Clustering on CIFAR-10 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Clustering CIFAR-10 TEMI CLIP ViT-L (openai) Accuracy 0.969 # 1
NMI 0.926 # 1
Train set Train # 1
ARI 0.932 # 1
Backbone ViT-L # 1
Image Clustering CIFAR-10 TEMI DINO ViT-B Accuracy 0.94.5 # 30
NMI 0.886 # 2
Train set Train # 1
ARI 0.885 # 2
Backbone ViT-B # 1
Image Clustering CIFAR-100 TEMI CLIP ViT-L (openai) Accuracy 0.737 # 1
NMI 0.799 # 1
Train Set Train # 1
ARI 0.612 # 1
Image Clustering CIFAR-100 TEMI DINO ViT-B Accuracy 0.671 # 2
NMI 0.769 # 2
Train Set Train # 1
ARI 0.533 # 2
Image Clustering ImageNet TEMI MSN (ViT-L) NMI 82.5 # 3
Accuracy 61.6 # 3
ARI 48.4 # 1
Image Clustering ImageNet TEMI DINO (ViT-B) NMI 81.4 # 6
Accuracy 58.0 # 4
ARI 45.9 # 2
Image Clustering ImageNet-100 TEMI CLIP ViT-L (openai) NMI 0.9006 # 1
ACCURACY 0.8343 # 1
ARI 0.7581 # 1
Image Clustering ImageNet-100 TEMI DINO ViT-B NMI 0.8565 # 3
ACCURACY 0.7505 # 3
ARI 0.6545 # 3
Image Clustering ImageNet-100 TEMI MSN ViT-L NMI 0.8853 # 2
ACCURACY 0.8286 # 2
ARI 0.7408 # 2
Image Clustering ImageNet-200 TEMI MSN ViT-L NMI 0.8665 # 2
ACCURACY 0.77.96 # 5
ARI 0.667 # 2
Image Clustering ImageNet-200 TEMI DINO ViT-B NMI 0.852 # 3
ACCURACY 0.7312 # 2
ARI 0.6231 # 3
Image Clustering ImageNet-200 TEMI CLIP ViT-L (openai) NMI 0.8839 # 1
ACCURACY 0.7776 # 1
ARI 0.6941 # 1
Image Clustering ImageNet-50 TEMI DINO ViT-B NMI 0.8610 # 3
ACCURACY 0.801 # 4
ARI 0.7093 # 4
Image Clustering ImageNet-50 TEMI CLIP ViT-L (openai) NMI 0.9232 # 1
ACCURACY 0.8827 # 1
ARI 0.8272 # 1
Image Clustering ImageNet-50 TEMI MSN ViT-L NMI 0.8814 # 2
ACCURACY 0.8487 # 2
ARI 0.7646 # 2
Image Clustering STL-10 TEMI DINO ViT-B Accuracy 0.985 # 1
NMI 0.965 # 1
Train Split Train # 1
ARI 0.968 # 1
Backbone ViT-B # 1

Methods


No methods listed for this paper. Add relevant methods here