Authors: André Lourenço 1 ; Samuel Rota Bulò 2 ; Nicola Rebagliati 2 ; Ana Fred 3 ; Mário Figueiredo 3 and Marcello Pelillo 2

Affiliations: 1 Instituto Superior de Engenharia de Lisboa and Instituto Superior Técnico, Portugal ; 2 Università Ca’ Foscari Venezia, Italy ; 3 Instituto Superior Técnico, Portugal

ISBN: 978-989-8565-41-9

ISSN: 2184-4313

Keyword(s): Clustering Algorithm, Clustering Ensembles, Probabilistic Modeling, Evidence Accumulation Clustering.

Related Ontology Subjects/Areas/Topics: Clustering ; Ensemble Methods ; Pattern Recognition ; Theory and Methods

Abstract: Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maxim um likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach. (More)

