# Weighted Evidence Accumulation Clustering Using Subsampling

### F. Jorge F. Duarte, Ana L. N. Fred, Fátima Rodrigues, João M. M. Duarte, André Lourenço

#### Abstract

We introduce an approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process. By applying a clustering algorithm to this co-association matrix we obtain the final data partition. In this paper we propose a clustering ensemble combination approach that uses subsampling and that weights differently the partitions (WEACS). We use two ways of weighting each partition: SWEACS, using a single validation index, and JWEACS, using a committee of indices. We compare combination results with the EAC technique and the HGPA, MCLA and CSPA methods by Strehl and Gosh using subsampling, and conclude that the WEACS approaches generally obtain better results. As a complementary step to the WEACS approach, we combine all the final data partitions produced by the different variations of the method and use the Ward Link algorithm to obtain the final data partition.

#### References

- A. Fred, “Finding consistent clusters in data partitions,”. in Multiple Classifier Systems, Josef Kittler and Fabio Roli editors, vol. LNCS 2096, Springer, 2001, pp. 309-318.
- Fred A., Jain A. K., “Evidence accumulation clustering based on the k-means algorithm,” in S.S.S.P.R, T.Caelli et al., editor,., Vol. LNCS 2396, Springer-Verlag, 2002, pp. 442 - 451
- Fred and A.K. Jain, “Combining Multiple Clusterings using Evidence Accumulation,” IEEE Transactions on Pattern analysis and Machine Intelligence, Vol. 27, No.6, June 2005, pp. 835-850.
- F.Jorge Duarte, Ana L.N. Fred, André Lourenço and M. Fátima C. Rodrigues, “Weighting Cluster Ensembles in Evidence Accumulation Clustering”, Workshop on Extraction of Knowledge from Databases and Warehouses, EPIA 2005.
- F.Jorge F.Duarte, Ana L.N. Fred, André Lourenço and M. Fátima C. Rodrigues, “Weighted Evidence Accumulation Clustering”, Fourth Australasian Conference on Knowledge Discovery and Data Mining 2005.
- A. Strehl and J. Ghosh, “Cluster ensembles - a knowledge reuse framework for combining multiple partitions,” Journal of Machine Learning Research 3, 2002.
- B. Park and H. Kargupta, Data Mining Handbook, chapter: Distributed Data Mining. Lawrence Erlbaum Associates, 2003.
- M. Meila and D. Heckerman, “An Experimental Comparison of Several Clustering and Initialization Methods”, Proc. 14th Conf. Uncertainty in Artificial Intelligence, p.p. 386- 395, 1998.
- M. Halkidi, Y. Batistakis, M. Vazirgiannis, "Clustering algorithms and validity measures", Tutorial paper in the proceedings of the SSDBM 2001 Conference.
- Theodorodis, S., Koutroubas, K., Pattern Recognition. Academic Press, 1999.
- Hubert L.J., Schultz J., “Quadratic assignment as a general data-analysis strategy,” British Journal of Mathematical and Statistical Psychology, Vol.29, 1975, pp. 190-241.
- Dunn, J.C., “Well separated clusters and optimal fuzzy partitions,” J. Cybern, Vol. 4, 1974, pp. 95-104.
- Davies, D.L., Bouldin, D.W., “A cluster separation measure,”. IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 1, No2, 1979.
- S.C. Sharma, Applied Multivariate Techniques, John Willwy & Sons, 1996.
- Calinski, R.B.& Harabasz, J, “A dendrite method for cluster analysis,” Communications in statistics 3, 1974, pp.1-27.
- Kaufman, L. & Roussesseeuw, P., Finding groups in data: an introduction to cluster analysis, New York, Wiley, 1990.
- U. Maulik and S. Bandyopadhyay, “Performance Evaluation of Some Clustering Algorithms and Validity Indices,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, no. 12, 2002, pp. 1650-1654.
- Xie, X.L., Beni, G., “A Validity Measure for Fuzzy Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 13, 1991, pp. 841-847.
- W. Krazanowski, Y. Lai, “A criterion for determining the number of groups in a dataset using sum of squares clustering”, Biometrics, 1985, pp. 23-34.
- J.A. Hartigan, “Statistical theory in clustering”, J. Classification, 1985, 63-76.
- C.H. Chou, M.C. Su, E. Lai, “A new cluster validity measure and its application to image compression”, Pattern Analysis and Applications, Vol. 7, 2004, pp. 205-220.
- S.T. Hadjitodorov, L. I. Kuncheva, L. P. Todorova, Moderate Diversity for Better Cluster Ensembles, Information Fusion, 2005, accepted
- X. Z. Fern, C.E. Broadley, “Random projection for high dimensional data clustering: a cluster ensemble approach”, 20th International Conference on Machine Learning, ICML;Washington, DC, 2003, pp. 186-193.
- S Monti; P. Tamayo; J. Mesirov; T. Golub, ”Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data”, Machine learning, 52, 2003, pp. 91-118.
- A. Topchy, B. Minaei-Bidgoli, A.K. Jain, W. Punch, “Adaptive Clustering Ensembles”, Proc. Intl. Conf on Pattern Recognition, ICPR'04, Cambridge, UK, 2004, pp. 272-275.
- B. Minaei-Bidgoli, A. Topchy, W. Punch, “Ensembles of Partitions via Data Resampling”, Proc. IEEE Intl. Conf. on Information Technology: Coding and Computing, ITCC04, vol. 2, April 2004, pp. 188-192.
- E. Dimitriadou, A. Weingessel, K. Hornik, “Voting-Merging: An Ensemble Method for Clustering”, Artificial Neural Networks - ICANN, August 2001.
- Lourenço, A., Fred, “A. Comparison of Combination Methods using Spectral Clustering Ensembles,” in Proc. Pattern Recognition on Information Systems, 2004.

#### Paper Citation

#### in Harvard Style

Jorge F. Duarte F., L. N. Fred A., Rodrigues F., M. M. Duarte J. and Lourenço A. (2006). **Weighted Evidence Accumulation Clustering Using Subsampling** . In *6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006)* ISBN 978-972-8865-55-9, pages 104-116. DOI: 10.5220/0002504501040116

#### in Bibtex Style

@conference{pris06,

author={F. Jorge F. Duarte and Ana L. N. Fred and Fátima Rodrigues and João M. M. Duarte and André Lourenço},

title={Weighted Evidence Accumulation Clustering Using Subsampling},

booktitle={6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006)},

year={2006},

pages={104-116},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0002504501040116},

isbn={978-972-8865-55-9},

}

#### in EndNote Style

TY - CONF

JO - 6th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2006)

TI - Weighted Evidence Accumulation Clustering Using Subsampling

SN - 978-972-8865-55-9

AU - Jorge F. Duarte F.

AU - L. N. Fred A.

AU - Rodrigues F.

AU - M. M. Duarte J.

AU - Lourenço A.

PY - 2006

SP - 104

EP - 116

DO - 10.5220/0002504501040116