DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA
Paula Brito, Marie Chavent
2012
Abstract
In this paper we propose a divisive top-down clustering method designed for interval and histogram-valued data. The method provides a hierarchy on a set of objects together with a monothetic characterization of each formed cluster. At each step, a cluster is split so as to minimize intra-cluster dispersion, which is measured using a distance suitable for the considered variable types. The criterion is minimized across the bipartitions induced by a set of binary questions. Since interval-valued variables may be considered a special case of histogram-valued variables, the method applies to data described by either kind of variables, or by variables of both types. An example illustrates the proposed approach.
References
- Billard, L. and Diday, E. (2006). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley.
- Bock, H.-H. and Diday, E. (2000). Analysis of Symbolic Data. Springer, Berlin-Heidelberg.
- Boley, D. L. (1998). Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4):325-344.
- Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
- Brito, P. (1994). Use of pyramids in symbolic data analysis. In Diday, E. et al., editors, New Approaches in Classification and Data Analysis, pages 378-386, BerlinHeidelberg. Springer.
- Brito, P. (1995). Symbolic objects: order structure and pyramidal clustering. Annals of Operations Research, 55:277-297.
- Brito, P. (1998). Symbolic clustering of probabilistic data. In Rizzi, A. et al., editors, Advances in Data Science and Classification, pages 385-389, BerlinHeidelberg. Springer.
- Chavent, M. (1998). A monothetic clustering method. Pattern Recognition Letters, 19(11):989-996.
- Chavent, M. (2000). Criterion-based divisive clustering for symbolic objects. In Bock, H.-H. and Diday, E., editors, Analysis of Symbolic Data, pages 299-311, Berlin-Heidelberg. Springer.
- Chavent, M., De Carvalho, F. A. T., Lechevallier, Y., and Verde, R. (2006). New clustering methods for interval data. Computational Statistics, 21(2):211-229.
- Chavent, M., Lechevallier, Y., and Briant, O. (2007). DIVCLUS-T: A monothetic divisive hierarchical clustering method. CSDA, 52(2):687-701.
- Ciampi, A. (1994). Classification and discrimination: the RECPAM approach. In Dutter, R. and Grossmann, W., editors, Proc. COMPSTAT'94, pages 129-147. Physica Verlag.
- De Carvalho, F. A. T., Brito, P., and Bock, H.-H. (2006). Dynamic clustering for interval data based on L2 distance. Computational Statistics, 21(2):231-250.
- De Carvalho, F. A. T., Csernel, M., and Lechevallier, Y. (2009). Clustering constrained symbolic data. Pattern Recognition Letters, 30(11):1037-1045.
- De Carvalho, F. A. T. and De Souza, R. M. C. R. (2010). Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognition Letters, 31(5):430-443.
- De Souza, R. M. C. R. and De Carvalho, F. A. T. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25(3):353-365.
- Dhillon, I. S., Mallela, S., and Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3:1265-1287.
- Diday, E. and Noirhomme-Fraiture, M. (2008). Symbolic Data Analysis and the Sodas Software. Wiley.
- Fang, H. and Saad, Y. (2008). Farthest centroids divisive clustering. In Proc. ICMLA, pages 232-238.
- Gowda, K. C. and Krishna, G. (1978). Disaggregative clustering using the concept of mutual nearest neighborhood. IEEE Trans. SMC, 8:888-895.
- Hardy, A. and Baune, J. (2007). Clustering and validation of interval data. In Brito, P. et al., editors, Selected Contributions in Data Analysis and Classification, pages 69-82, Heidelberg. Springer.
- Hardy, A. and Kasaro, N. (2009). A new clustering method for interval data. MSH/MSS, 187:79-91.
- Irpino, A. and Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In Batagelj, V. et al., editors, Proc. IFCS 2006, pages 185-192, Heidelberg. Springer.
- Kaufman, L. and Rousseeuw, P. J. (1990). Finding Groups in Data. Wiley, New York.
- Lance, G. N. and Williams, W. T. (1968). Note on a new information statistic classification program. The Computer Journal, 11:195-197.
- MacNaughton-Smith, P. (1964). Dissimilarity analysis: A new technique of hierarchical subdivision. Nature, 202:1034-1035.
- Michalski, R. S., Diday, E., and Stepp, R. (1981). A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts. In Kanal, L. N. and Rosenfeld, A., editors, Progress in Pattern Recognition, pages 33-56. Springer.
- Michalski, R. S. and Stepp, R. (1983). Learning from observations: Conceptual clustering. In Michalsky, R. S. et al., editors, Machine Learning: An Artificial Intelligence Approach, pages 163-190. Morgan Kaufmann.
- Noirhomme-Fraiture, M. and Brito, P. (2011). Far beyond the classical data models: Symbolic data analysis. Statistical Analysis and Data Mining, 4(2):157-170.
- Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81-106.
- Sneath, P. H. and Sokal, R. R. (1973). Numerical Taxonomy. Freeman, San Francisco.
- Williams, W. T. and Lambert, J. M. (1959). Multivariate methods in plant ecology. J. Ecology, 47:83-101.
Paper Citation
in Harvard Style
Brito P. and Chavent M. (2012). DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 229-234. DOI: 10.5220/0003793502290234
in Bibtex Style
@conference{icpram12,
author={Paula Brito and Marie Chavent},
title={DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={229-234},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003793502290234},
isbn={978-989-8425-98-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - DIVISIVE MONOTHETIC CLUSTERING FOR INTERVAL AND HISTOGRAM-VALUED DATA
SN - 978-989-8425-98-0
AU - Brito P.
AU - Chavent M.
PY - 2012
SP - 229
EP - 234
DO - 10.5220/0003793502290234