Ensemble clustering based on evidence extracted from the co-association matrix

1 Aug 2019  ·  Lianyu Hu, Caiming Zhong, Xiaodong Yue, Ting Luo, Qiang Fu, Haiyong Xu ·

The evidence accumulation model is an approach for collecting the information of base partitions in a clustering ensemble method, and can be viewed as a kernel transformation from the original data space to a co-association matrix. However, cluster structure information may be partially lost in this transformation; hence, some methods proposed in the literature try to find the lost information and return it to the ensemble process. In this paper, an interesting phenomenon is introduced: remove some evidences from the co-association matrix, which can result in more accurate clustering results. The intuitive explanation for this is that some evidences in the original co-association matrix could be noise, with negative effects on the final clustering. However, it is difficult to detect those evidences practically, let alone remove them from the matrix. To remedy this problem, we remove multiple level evidences having low occurrence frequencies, because negative evidences do not normally occur regularly in the base partitions. Subsequently, we use normalized cut to achieve multiple clustering results. To discriminate the optimal ensemble result, an internal validity index, which uses only the co-association matrix, is specially designed for the clustering ensemble. The experimental results on 16 datasets demonstrate that the proposed scheme outperforms some state-of-the-art clustering ensemble approaches.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Clustering Ensemble ionosphere NegMM Purity 0.83 # 1
Clustering Ensemble pathbased NegMM Purity 0.98 # 1

Methods


No methods listed for this paper. Add relevant methods here