Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization

Image clustering is one of the most important computer vision applications, which has been extensively studied in literature. However, current clustering methods mostly suffer from lack of efficiency and scalability when dealing with large-scale and high-dimensional data. In this paper, we propose a new clustering model, called DEeP Embedded RegularIzed ClusTering (DEPICT), which efficiently maps data into a discriminative embedding subspace and precisely predicts cluster assignments. DEPICT generally consists of a multinomial logistic regression function stacked on top of a multi-layer convolutional autoencoder. We define a clustering objective function using relative entropy (KL divergence) minimization, regularized by a prior for the frequency of cluster assignments. An alternating strategy is then derived to optimize the objective by updating parameters and estimating cluster assignments. Furthermore, we employ the reconstruction loss functions in our autoencoder, as a data-dependent regularization term, to prevent the deep embedding function from overfitting. In order to benefit from end-to-end optimization and eliminate the necessity for layer-wise pretraining, we introduce a joint learning framework to minimize the unified clustering and reconstruction loss functions together and train all network layers simultaneously. Experimental results indicate the superiority and faster running time of DEPICT in real-world clustering tasks, where no labeled data is available for hyper-parameter tuning.

PDF Abstract ICCV 2017 PDF ICCV 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Image Clustering CMU-PIE DEPICT NMI 0.964 # 2
Accuracy 0.850 # 2
Image Clustering CUB Birds DEPICT Accuracy 0.061 # 2
NMI 0.290 # 3
Image Clustering CUB Birds DEPICT-Large Accuracy 0.061 # 2
NMI 0.297 # 2
Image Clustering FRGC DEPICT NMI 0.583 # 1
Accuracy 0.432 # 1
Image Clustering Stanford Cars DEPICT Accuracy 0.063 # 2
NMI 0.329 # 3
Image Clustering Stanford Cars DEPICT-Large Accuracy 0.062 # 3
NMI 0.330 # 2
Image Clustering Stanford Dogs DEPICT-Large Accuracy 0.054 # 2
NMI 0.183 # 2
Image Clustering Stanford Dogs DEPICT Accuracy 0.052 # 3
NMI 0.182 # 3
Image Clustering YouTube Faces DB DEPICT NMI 0.802 # 3
Accuracy 0.611 # 1

Methods


No methods listed for this paper. Add relevant methods here