A Novel Distance Measure for Interval Data

Jie Ouyang, Ishwar K. Sethi

Abstract

Interval data is attracting attention from the data analysis community due to its ability to describe complex concepts. Since clustering is an important data analysis tool, extending these techniques to interval data is important. Applying traditional clustering methods on interval data loses information inherited in this particular data type. This paper proposes a novel dissimilarity measure which explores the internal structure of intervals in a probabilistic manner based on domain knowledge. Our experiments show that interval clustering based on the proposed dissimilarity measure produces meaningful results.

References

  1. Hans Hermann Bock: Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer-Verlag New York, Inc. Secaucus, NJ, USA (2000)
  2. A. K. Jain, M. N. Murty, P. J. Flynn: Data clustering: A Review. ACM Computing Surveys Vol. 31 No. 3 (1999) 264-323
  3. Rui Xu, Donald Wunsch II: Survey of Clustering Algorithms. IEEE Transactions on Neural Networks Vol. 16 No. 3 (2005) 645-678
  4. Francisco de A. T. de Carvalho, Renata M. C. R. de Souza, Marie Chavent, Yves Lechevallier: Adaptive Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data. Pattern Recognition Letters Vol. 27 No. 3 (2006) 167-179
  5. R.M.C.R. de Souza, F.A.T. de Carvalho: Dynamic Clustering of Interval Data Based on Adaptive Chebyshev Distances. Electronics Letters Vol. 40 No. 11 (2004) 658-660
  6. Renata M. C. R. de Souza, Francisco de A. T. de Carvalho: Clustering of Interval Data Based on City-block Distances. Pattern Recognition Letters Vol. 25 No. 3 (2004) 353-365
  7. Antonio Irpino, Valentino Tontodonato: Clustering Reduced Interval Data Using Hausdorff Distance. Computational Statistics Vol. 21 No. 2 (2006) 271-288
  8. Francisco de A. T. de Carvalho, Paula Brito, Hans-Hermann Bock: Dynamic Clustering for Interval Data Based on L2 Distance. Computational Statistics Vol. 21 No. 2 (2006) 231-250
  9. S. Asharaf, M. Narasimha Murty, S. K. Shevade: Rough Set Based Incremental Clustering of Interval Data. Pattern Recognition Letters Vol. 27 No. 6 (2006) 515-519
  10. D.S. Guru, Bapu B. Kiranagi: Multivalued Type Dissimilarity Measure and Concept of Mutual Dissimilarity Value for Clustering Symbolic Patterns. Pattern Recognition Vol. 38 No. 1 (2005) 151-156
  11. Wei Peng, Tao Li: Interval Data Clustering with Applications. Tools with Artificial Intelligence, 18th IEEE International Conference on (2006) 355-362
  12. Ying Zhao, George Karypis: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning Vol. 55 No. 3 (2004) 311-331
  13. Manabu Ichino, Hiroyuki Yaguchi: Generalized Minkowski Metrics for Mixed Fature-type Data Analysis. IEEE Transactions on Systems, Man, and Cybernetics Vol. 24 No. 4 (1994) 698-708
Download


Paper Citation


in Harvard Style

Ouyang J. and K. Sethi I. (2007). A Novel Distance Measure for Interval Data . In Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007) ISBN 978-972-8865-93-1, pages 49-58. DOI: 10.5220/0002425000490058


in Bibtex Style

@conference{pris07,
author={Jie Ouyang and Ishwar K. Sethi},
title={A Novel Distance Measure for Interval Data},
booktitle={Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)},
year={2007},
pages={49-58},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002425000490058},
isbn={978-972-8865-93-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Workshop on Pattern Recognition in Information Systems - Volume 1: PRIS, (ICEIS 2007)
TI - A Novel Distance Measure for Interval Data
SN - 978-972-8865-93-1
AU - Ouyang J.
AU - K. Sethi I.
PY - 2007
SP - 49
EP - 58
DO - 10.5220/0002425000490058