DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA

Paula Brito, Marie Chavent

2012

Abstract

In this paper we propose a divisive top-down clustering method designed for interval and histogram-valued data. The method provides a hierarchy on a set of objects together with a monothetic characterization of each formed cluster. At each step, a cluster is split so as to minimize intra-cluster dispersion, which is measured using a distance suitable for the considered variable types. The criterion is minimized across the bipartitions induced by a set of binary questions. Since interval-valued variables may be considered a special case of histogram-valued variables, the method applies to data described by either kind of variables, or by variables of both types. An example illustrates the proposed approach.

References

  1. Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley.
  2. Bock, H.-H. and Diday, E. (2000). Analysis of Symbolic Data. Springer, Berlin-Heidelberg.
  3. Boley, D. L. (1998). Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4):325-344.
  4. Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  5. Brito, P. (1994). Use of pyramids in symbolic data analysis. In Diday, E. et al., editors, New Approaches in Classification and Data Analysis, pages 378-386, BerlinHeidelberg. Springer.
  6. Brito, P. (1995). Symbolic objects: order structure and pyramidal clustering. Annals of Operations Research, 55:277-297.
  7. Brito, P. (1998). Symbolic clustering of probabilistic data. In Rizzi, A. et al., editors, Advances in Data Science and Classification, pages 385-389, BerlinHeidelberg. Springer.
  8. Chavent, M. (1998). A monothetic clustering method. Pattern Recognition Letters, 19(11):989-996.
  9. Chavent, M. (2000). Criterion-based divisive clustering for symbolic objects. In Bock, H.-H. and Diday, E., editors, Analysis of Symbolic Data, pages 299-311, Berlin-Heidelberg. Springer.
  10. Chavent, M., De Carvalho, F. A. T., Lechevallier, Y., and Verde, R. (2006). New clustering methods for interval data. Computational Statistics, 21(2):211-229.
  11. Chavent, M., Lechevallier, Y., and Briant, O. (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. CSDA, 52(2):687-701.
  12. Ciampi, A. (1994). Classification and discrimination: the RECPAM approach. In Dutter, R. and Grossmann, W., editors, Proc. COMPSTAT'94, pages 129-147. Physica Verlag.
  13. De Carvalho, F. A. T., Brito, P., and Bock, H.-H. (2006). Dynamic clustering for interval data based on L2 distance. Computational Statistics, 21(2):231-250.
  14. De Carvalho, F. A. T., Csernel, M., and Lechevallier, Y. (2009). Clustering constrained symbolic data. Pattern Recognition Letters, 30(11):1037-1045.
  15. De Carvalho, F. A. T. and De Souza, R. M. C. R. (2010). Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 31(5):430-443.
  16. De Souza, R. M. C. R. and De Carvalho, F. A. T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25(3):353-365.
  17. Dhillon, I. S., Mallela, S., and Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3:1265-1287.
  18. Diday, E. and Noirhomme-Fraiture, M. (2008). Symbolic Data Analysis and the Sodas Software. Wiley.
  19. Fang, H. and Saad, Y. (2008). Farthest centroids divisive clustering. In Proc. ICMLA, pages 232-238.
  20. Gowda, K. C. and Krishna, G. (1978). Disaggregative clustering using the concept of mutual nearest neighborhood. IEEE Trans. SMC, 8:888-895.
  21. Hardy, A. and Baune, J. (2007). Clustering and validation of interval data. In Brito, P. et al., editors, Selected Contributions in Data Analysis and Classification, pages 69-82, Heidelberg. Springer.
  22. Hardy, A. and Kasaro, N. (2009). A new clustering method for interval data. MSH/MSS, 187:79-91.
  23. Irpino, A. and Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In Batagelj, V. et al., editors, Proc. IFCS 2006, pages 185-192, Heidelberg. Springer.
  24. Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data. Wiley, New York.
  25. Lance, G. N. and Williams, W. T. (1968). Note on a new information statistic classification program. The Computer Journal, 11:195-197.
  26. MacNaughton-Smith, P. (1964). Dissimilarity analysis: A new technique of hierarchical subdivision. Nature, 202:1034-1035.
  27. Michalski, R. S., Diday, E., and Stepp, R. (1981). A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Kanal, L. N. and Rosenfeld, A., editors, Progress in Pattern Recognition, pages 33-56. Springer.
  28. Michalski, R. S. and Stepp, R. (1983). Learning from observations: Conceptual clustering. In Michalsky, R. S. et al., editors, Machine Learning: An Artificial Intelligence Approach, pages 163-190. Morgan Kaufmann.
  29. Noirhomme-Fraiture, M. and Brito, P. (2011). Far beyond the classical data models: Symbolic data analysis. Statistical Analysis and Data Mining, 4(2):157-170.
  30. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81-106.
  31. Sneath, P. H. and Sokal, R. R. (1973). Numerical Taxonomy. Freeman, San Francisco.
  32. Williams, W. T. and Lambert, J. M. (1959). Multivariate methods in plant ecology. J. Ecology, 47:83-101.
Download


Paper Citation


in Harvard Style

Brito P. and Chavent M. (2012). DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 229-234. DOI: 10.5220/0003793502290234


in Bibtex Style

@conference{icpram12,
author={Paula Brito and Marie Chavent},
title={DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={229-234},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003793502290234},
isbn={978-989-8425-98-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA
SN - 978-989-8425-98-0
AU - Brito P.
AU - Chavent M.
PY - 2012
SP - 229
EP - 234
DO - 10.5220/0003793502290234