Incremental Constrained Clustering by Minimal Weighted Modification

Clustering is a well-known task in Data Mining that aims at grouping data instances according to their similarity. It is an exploratory and unsupervised task whose results depend on many parameters, often requiring the expert to iterate several times before satisfaction. Constrained clustering has been introduced for better modeling the expectations of the expert. Nevertheless constrained clustering is not yet sufficient since it usually requires the constraints to be given before the clustering process. In this paper we address a more general problem that aims at modeling the exploratory clustering process, through a sequence of clustering modifications where expert constraints are added on the fly. We present an incremental constrained clustering framework integrating active query strategies and a Constraint Programming model to fit the expert expectations while preserving the stability of the partition, so that the expert can understand the process and apprehend its impact. Our model supports instance and group-level constraints, which can be relaxed. Experiments on reference datasets and a case study related to the analysis of satellite image time series show the relevance of our framework.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Incremental Constrained Clustering iris PCK-Means+NPU AUBC-ARI (quality) 0.876±0.018 # 4
AUBC-ARI (similarity) 0.398±0.029 # 6
Incremental Constrained Clustering iris PCK-Means+Random AUBC-ARI (quality) 0.695±0.018 # 8
AUBC-ARI (similarity) 0.271±0.008 # 8
Incremental Constrained Clustering iris COP-KMeans+Random AUBC-ARI (quality) 0.712±0.012 # 7
AUBC-ARI (similarity) 0.309±0.004 # 7
Incremental Constrained Clustering iris MPCK-Means+Random AUBC-ARI (quality) 0.783±0.025 # 6
AUBC-ARI (similarity) 0.472±0.024 # 4
Incremental Constrained Clustering iris IAC+Random AUBC-ARI (quality) 0.816±0.014 # 5
AUBC-ARI (similarity) 0.605±0.016 # 2
Incremental Constrained Clustering iris IAC+NPU AUBC-ARI (quality) 0.941±0.007 # 1
AUBC-ARI (similarity) 0.668±0.02 # 1
Incremental Constrained Clustering iris MPCK-Means+NPU AUBC-ARI (quality) 0.928±0.015 # 2
AUBC-ARI (similarity) 0.584±0.027 # 3
Incremental Constrained Clustering iris COP-KMeans+NPU AUBC-ARI (quality) 0.88±0.016 # 3
AUBC-ARI (similarity) 0.432±0.029 # 5
Incremental Constrained Clustering Wine MPCK-Means+NPU AUBC-ARI (quality) 0.893±0.016 # 1
AUBC-ARI (similarity) 0.817±0.002 # 2
Incremental Constrained Clustering Wine IAC+Random AUBC-ARI (quality) 0.349±0.01 # 8
AUBC-ARI (similarity) 0.441±0.012 # 4
Incremental Constrained Clustering Wine IAC+NPU AUBC-ARI (quality) 0.481±0.016 # 3
AUBC-ARI (similarity) 0.455±0.09 # 3
Incremental Constrained Clustering Wine COP-KMeans+Random AUBC-ARI (quality) 0.369±0.003 # 7
AUBC-ARI (similarity) 0.241±0.008 # 8
Incremental Constrained Clustering Wine COP-KMeans+NPU AUBC-ARI (quality) 0.469±0.019 # 5
AUBC-ARI (similarity) 0.340±0.001 # 5
Incremental Constrained Clustering Wine MPCK-Means+Random AUBC-ARI (quality) 0.821±0.005 # 2
AUBC-ARI (similarity) 0.845±0.006 # 1
Incremental Constrained Clustering Wine PCK-Means+NPU AUBC-ARI (quality) 0.472±0.017 # 4
AUBC-ARI (similarity) 0.337±0.011 # 6
Incremental Constrained Clustering Wine PCK-Means+Random AUBC-ARI (quality) 0.371±0.003 # 6
AUBC-ARI (similarity) 0.332±0.01 # 7

Methods