String Patterns: From Single Clustering to Ensemble Methods and Validation

André Lourenço, Ana Fred

2007

Abstract

We address the problem of clustering of string patterns, in an Ensemble Methods perspective. In this approach different partitionings of the data are combined attempting to find a better and more robust partition. In this study we cover the different phases of this approach: from the generation of the partitions, the clustering ensemble, to the combination and validation of the combined result. For the generation we address, both different clustering algorithms (using both the hierarchical agglomerative concept and partitional approaches) and different similarity measures (string matching, structural resemblance). The focus of the paper is the concept of validation/selection of the final data partition. For that, an information-theoretic measure in conjunction with a variance analysis using bootstrapping is used to quantitatively measure the consistency between partitions and combined results and choose the best obtained result without the use of additional information. Experimental results on a real data set (contour images), show that this approach can be used to unsupervisedly choose the best partition amongst alternative solutions, as validated by measuring the consistency with the ground truth information.

References

  1. K.S Fu. Handbook of Pattern Recognition and Image Processing, chapter Syntatic pattern recognition, pages 85-117. Academic Press, 1986.
  2. A. Fred. Pattern Recognition and String Matching, chapter Similarity measures and clustering of string patterns. Kluwer Academic, 2002.
  3. A. Fred and A.K. Jain. Combining multiple clustering using evidence accumulation. IEEE Trans Pattern Analysis and Machine Intelligence, 27(6):835-850, June 2005.
  4. A. Strehl and J. Ghosh. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 2002.
  5. A. Topchy, A.K. Jain, and W. Punch. A mixture model of clustering ensembles. In Proceedings SIAM Conf. on Data Mining, April 2004. in press.
  6. André Lourenc¸o and Ana L. N. Fred. Ensemble methods in the clustering of string patterns. In Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05), volume 1, pages 143-148, 2005.
  7. A. Fred and A.K. Jain. Robust data clustering. In Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR, 2003.
  8. A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In S. Becker T. G. Dietterich and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002.
  9. A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.
  10. A.K. Jain, M.N. Murty, and P.J Flynn. Data clustering: A review. In ACM Computing Surveys, volume Vol 31, pages 264-323, 1999.
  11. Erel Levine and Eytan Domany. Resampling method for unsupervised estimation of cluster validity. Aaa, 2000.
  12. M. Halkidi, Y. Batistakis, and M. Vazirgiannis. Cluster validity methods: Part i. SIGMOD Record, June 2002.
  13. V. Roth, T. Lange, M. Braun, and J. Buhmann. A resampling approach to cluster validation. In Computational Statistics-COMPSTAT, 2002.
  14. A. Fred. Finding consistent clusters in data partitions. In Josef Kittler and Fabio Roli, editors, Multiple Classifier Systems, volume 2096, pages 309-318, 2001.
  15. A.L. Fred., J.S. Marques, and P.M Jorge. Hiden markov models vs syntactic modeling in object recognition. In Proc. of the Int'l Conference on Image Processing (ICIP), Santa Barbara, October 1997.
  16. A.K. Jain. Fundamentals of Digital Image Processing. Prentice-Hall, 1989.
Download


Paper Citation


in Harvard Style

Lourenço A. and Fred A. (2007). String Patterns: From Single Clustering to Ensemble Methods and Validation . In Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007) ISBN 978-972-8865-93-1, pages 39-48. DOI: 10.5220/0002438400390048


in Bibtex Style

@conference{pris07,
author={André Lourenço and Ana Fred},
title={String Patterns: From Single Clustering to Ensemble Methods and Validation},
booktitle={Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)},
year={2007},
pages={39-48},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002438400390048},
isbn={978-972-8865-93-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)
TI - String Patterns: From Single Clustering to Ensemble Methods and Validation
SN - 978-972-8865-93-1
AU - Lourenço A.
AU - Fred A.
PY - 2007
SP - 39
EP - 48
DO - 10.5220/0002438400390048