Authors:
André Lourenço
1
;
Samuel Rota Bulò
2
;
Nicola Rebagliati
2
;
Ana Fred
3
;
Mário Figueiredo
3
and
Marcello Pelillo
2
Affiliations:
1
Instituto Superior de Engenharia de Lisboa and Instituto Superior Técnico, Portugal
;
2
Università Ca’ Foscari Venezia, Italy
;
3
Instituto Superior Técnico, Portugal
Keyword(s):
Clustering Algorithm, Clustering Ensembles, Probabilistic Modeling, Evidence Accumulation Clustering.
Related
Ontology
Subjects/Areas/Topics:
Clustering
;
Ensemble Methods
;
Pattern Recognition
;
Theory and Methods
Abstract:
Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maxi
mum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
(More)