Authors:
F. Jorge F. Duarte
1
;
João M. M. Duarte
1
;
M. Fátima C. Rodrigues
1
and
Ana L. N. Fred
2
Affiliations:
1
Instituto Superior de Engenharia do Porto, Portugal
;
2
Instituto Superior Técnico, Portugal
Keyword(s):
Cluster ensemble selection, Cluster ensembles, Data clustering, Unsupervised learning.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Clustering and Classification Methods
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
In order to combine multiple data partitions into a more robust data partition, several approaches to produce the cluster ensemble and various consensus functions have been proposed. This range of possibilities in the multiple data partitions combination raises a new problem: which of the existing approaches, to produce the cluster ensembles’ data partitions and to combine these partitions, best fits a given data set. In this paper, we address the cluster ensemble selection problem. We proposed a new measure to select the best consensus data partition, among a variety of consensus partitions, based on a notion of average cluster consistency between each data partition that belongs to the cluster ensemble and a given consensus partition. We compared the proposed measure with other measures for cluster ensemble selection, using 9 different data sets, and the experimental results shown that the consensus partitions selected by our approach usually were of better quality in comparison wi
th the consensus partitions selected by other measures used in our experiments.
(More)