Comparison of Combination Methods using Spectral Clustering Ensembles

André Lourenço, Ana Fred

Abstract

We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the consistency between the combined data partition and (a) ground truth information, and (b) the clustering ensemble. Two consistency measures are used: (i) an index based on cluster matching between two partitions; and (ii) an information theoretic index exploring the concept of mutual information between data partitions. Results on a variety of synthetic and real data sets show that, while combination results are more robust solutions than individual clusterings, no combination method proves to be a clear winner. Furthermore, without the use of a priori information, the mutual information based measure is not able to systematically select the best combination method for each problem, optimality being measured based on ground truth information.

References

  1. J. Kittler, M. Hatef, R. Duin, and J. Matas. On combining classi¯ers. IEEE Trans. Pattern Analysis and Machine Intelligence, 20(3):226{239, 2000.
  2. Fabio Roli and Josef Kittler. Fusion of multiple classi¯ers. In Information Fusion, volume 3, page 243, 2002.
  3. A. Fred. Finding consistent clusters in data partitions. In Josef Kittler and Fabio Roli, editors, Multiple Classi¯er Systems, volume LNCS 2096, pages 309{ 318. Springer, 2001.
  4. A. Strehl and J. Ghosh. Cluster ensembles - a knoledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 2002.
  5. B. Park and H. Kargupta. Data Mining Handbook, chapter Distributed Data Mining. Lawrence Erlbaum Associates, 2003.
  6. A. Fred and A.K. Jain. Data clustering using evidence accumulation. In Proc. of the 16th Int'l Conference on Pattern Recognition, pages 276{280, 2002.
  7. X. Z. Fern and C.E. Brodley. Random projection for high dimensional data clustering: A cluster ensemble approach. In Proceedings of 20th International Conference on Machine learning (ICML2003), 2003.
  8. A. Topchy, A.K. Jain, and W. Punch. A mixture model of clustering ensembles. In Proceedings SIAM Conf. on Data Mining, April 2004. in press.
  9. Erik L. Johnson and Hillol Kargupta. Collective, hierarchical clustering from distributed, heterogeneous data. In Large-Scale Parallel Data Mining, pages 221{244, 1999.
  10. E. Dimitriadou, A. Weingessel, and K. Hornik. A voting-merging clustering algorithm. In SFB, editor, FuzzyNeuro Systems 7899, volume Adaptive Information Systems and Modeling in Economics and Management Science, April 1999.
  11. A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In S. Becker T. G. Dietterich and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002.
  12. D. Verma and M. Meila. A comparision of spectral clustering algorithms. Technical report, UW CSE Technical report, 2003.
  13. G. Karypis and V. Kumar. Multilevel algorithms for multi-constraint graph partitioning. In Proceedings of the 10th Supercomputing Conference, 1998.
  14. G.Karypis, R.Aggarwal, V.Kumar, and S.Shekhar. Multilevel hypergraph partitioning: Applications in vlsi domain. In Proc. Design Automation Conf., 1997.
  15. A. Raftery K. Yeung, C.Fraley and W.Ruzzo. Model-based clustering and data transformation for gene expression data. Technical Report UW-CSE-01-04-02, Dept. of Computer Science and Engineering, University of Washington, 2001.
Download


Paper Citation


in Harvard Style

Lourenço A. and Fred A. (2004). Comparison of Combination Methods using Spectral Clustering Ensembles . In Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004) ISBN 972-8865-01-5, pages 222-233. DOI: 10.5220/0002688102220233


in Bibtex Style

@conference{pris04,
author={André Lourenço and Ana Fred},
title={Comparison of Combination Methods using Spectral Clustering Ensembles},
booktitle={Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)},
year={2004},
pages={222-233},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002688102220233},
isbn={972-8865-01-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2004)
TI - Comparison of Combination Methods using Spectral Clustering Ensembles
SN - 972-8865-01-5
AU - Lourenço A.
AU - Fred A.
PY - 2004
SP - 222
EP - 233
DO - 10.5220/0002688102220233