Probabilistic Evidence Accumulation for Clustering Ensembles
André Lourenço, Samuel Rota Bulò, Nicola Rebagliati, Ana Fred, Mário Figueiredo, Marcello Pelillo
2013
Abstract
Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurrence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters, which are in turn estimated using a maximum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
References
- Ayad, H. and Kamel, M. S. (2008). Cumulative voting consensus method for partitions with variable number of clusters. IEEE Trans. Pattern Anal. Mach. Intell., 30(1):160-173.
- Baeza-Yates, R. A. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
- Bezdek, J. and Hathaway, R. (2002). Vat: a tool for visual assessment of (cluster) tendency. In Neural Networks, 2002. IJCNN 7802. Proceedings of the 2002 International Joint Conference on, volume 3, pages 2225 - 2230.
- Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press, first edition edition.
- Dimitriadou, E., Weingessel, A., and Hornik, K. (2002). A combination scheme for fuzzy clustering. In AFSS'02, pages 332-338.
- Fern, X. Z. and Brodley, C. E. (2004). Solving cluster ensemble problems by bipartite graph partitioning. In Proc ICML 7804.
- Fred, A. (2001). Finding consistent clusters in data partitions. In Kittler, J. and Roli, F., editors, Multiple Classifier Systems, volume 2096, pages 309-318. Springer.
- Fred, A. and Jain, A. (2002). Data clustering using evidence accumulation. In Proc. of the 16th Int'l Conference on Pattern Recognition, pages 276-280.
- Fred, A. and Jain, A. (2005). Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis and Machine Intelligence, 27(6):835-850.
- Ghosh, J. and Acharya, A. (2011). Cluster ensembles. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 1(4):305-315.
- Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228- 5235.
- Jain, A. K. and Dubes, R. (1988). Algorithms for Clustering Data. Prentice Hall.
- Kachurovskii, I. R. (1960). On monotone operators and convex functionals. Uspekhi Mat. Nauk, 15(4):213- 215.
- Lourenc¸o, A., Fred, A., and Figueiredo, M. (2011). A generative dyadic aspect model for evidence accumulation clustering. In Proc. 1st Int. Conf. Similaritybased pattern recognition, SIMBAD'11, pages 104- 116, Berlin, Heidelberg. Springer-Verlag.
- Lourenc¸o, A., Fred, A., and Jain, A. K. (2010). On the scalability of evidence accumulation clustering. In 20th International Conference on Pattern Recognition (ICPR), pages 782 -785, Istanbul Turkey.
- Luenberger, D. G. and Ye, Y. (2008). Linear and Nonlinear Programming. Springer, third edition edition.
- Manning, C. D., Raghavan, P., and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.
- Meila, M. (2003). Comparing clusterings by the variation of information. In Springer, editor, Proc. of the Sixteenth Annual Conf. of Computational Learning Theory (COLT).
- Ng, A. Y., Jordan, M. I., and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In NIPS, pages 849-856. MIT Press.
- Rota Bulò, S., Lourenc¸o, A., Fred, A., and Pelillo, M. (2010). Pairwise probabilistic clustering using evidence accumulation. In Proc. 2010 Int. Conf. on Structural, Syntactic, and Statistical Pattern Recognition, SSPR&SPR'10, pages 395-404.
- Sculley, D. (2010). Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, WWW 7810, pages 1177-1178, New York, NY, USA. ACM.
- Steyvers, M. and Griffiths, T. (2007). Probabilistic topic models, chapter Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum.
- Strehl, A. and Ghosh, J. (2002). Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. of Machine Learning Research 3.
- Topchy, A., Jain, A., and Punch, W. (2004). A mixture model of clustering ensembles. In Proc. of the SIAM Conf. on Data Mining.
- Topchy, A., Jain, A. K., and Punch, W. (2005). Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell., 27(12):1866- 1881.
- Wang, H., Shan, H., and Banerjee, A. (2009). Bayesian cluster ensembles. In 9th SIAM Int. Conf. on Data Mining.
- Wang, P., Domeniconi, C., and Laskey, K. B. (2010). Nonparametric bayesian clustering ensembles. In ECML PKDD'10, pages 435-450.
Paper Citation
in Harvard Style
Lourenço A., Rota Bulò S., Rebagliati N., Fred A., Figueiredo M. and Pelillo M. (2013). Probabilistic Evidence Accumulation for Clustering Ensembles . In Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8565-41-9, pages 58-67. DOI: 10.5220/0004267900580067
in Bibtex Style
@conference{icpram13,
author={André Lourenço and Samuel Rota Bulò and Nicola Rebagliati and Ana Fred and Mário Figueiredo and Marcello Pelillo},
title={Probabilistic Evidence Accumulation for Clustering Ensembles},
booktitle={Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2013},
pages={58-67},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004267900580067},
isbn={978-989-8565-41-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 2nd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Probabilistic Evidence Accumulation for Clustering Ensembles
SN - 978-989-8565-41-9
AU - Lourenço A.
AU - Rota Bulò S.
AU - Rebagliati N.
AU - Fred A.
AU - Figueiredo M.
AU - Pelillo M.
PY - 2013
SP - 58
EP - 67
DO - 10.5220/0004267900580067