Multi-Modal Deep Clustering: Unsupervised Partitioning of Images

5 Dec 2019  ·  Guy Shiran, Daphna Weinshall ·

The clustering of unlabeled raw images is a daunting task, which has recently been approached with some success by deep learning methods. Here we propose an unsupervised clustering framework, which learns a deep neural network in an end-to-end fashion, providing direct cluster assignments of images without additional processing. Multi-Modal Deep Clustering (MMDC), trains a deep network to align its image embeddings with target points sampled from a Gaussian Mixture Model distribution. The cluster assignments are then determined by mixture component association of image embeddings. Simultaneously, the same deep network is trained to solve an additional self-supervised task of predicting image rotations. This pushes the network to learn more meaningful image representations that facilitate a better clustering. Experimental results show that MMDC achieves or exceeds state-of-the-art performance on six challenging benchmarks. On natural image datasets we improve on previous results with significant margins of up to 20% absolute accuracy points, yielding an accuracy of 82% on CIFAR-10, 45% on CIFAR-100 and 69% on STL-10.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Clustering CIFAR-10 MMDC Accuracy 0.820 # 16
NMI 0.703 # 17
Backbone ResNet18 # 1
Image Clustering CIFAR-100 MMDC Accuracy 0.446 # 12
NMI 0.418 # 14
Image Clustering ImageNet-10 MMDC Accuracy 0.811 # 10
NMI 0.719 # 10
Image Clustering STL-10 MMDC Accuracy 0.694 # 15
NMI 0.593 # 13
Backbone ResNet18 # 1
Image Clustering Tiny-ImageNet MMDC Accuracy 0.119 # 6
NMI 0.274 # 6

Methods


No methods listed for this paper. Add relevant methods here