Measuring Cluster Similarity by the Travel Time between Data Points

Yonggang Lu, Xiaoli Hou, Xurong Chen

Abstract

A new similarity measure for hierarchical clustering is proposed. The idea is to treat all the data points as mass points under a hypothetical gravitational force field, and derive the hierarchical clustering results by estimating the travel time between data points. The shorter the time needed to travel from one point to another, the more similar the two data points are. In order to avoid the complexity in the simulation using molecular dynamics, the potential field produced by all the data points is computed. Then the travel time between a pair of data points is estimated using the potential field. In our method, the travel time is used to construct a new similarity measure, and an edge-weighted tree of all the data points is built to improve the efficiency of the hierarchical clustering. The proposed method called Travel-Time based Hierarchical Clustering (TTHC) is evaluated by comparing with four other hierarchical clustering methods. Two real datasets and two synthetic dataset families composed of 200 randomly produced datasets are used in our Experiments. It is shown that the TTHC method can produce very competitive results, and using the estimated travel time instead of the distance between data points is capable of improving the robustness and the quality of clustering.

References

  1. Assent, I., Clustering high dimensional data, WIREs Data Mining and Knowledge Discovery, 2: 340-350, 2012.
  2. Endo, Y., Iwata, H., Dynamic clustering based on universal gravitation model, Modeling Decisions for Artificial Intelligence, Lecture Notes in Computer Science, 3558: 183-193, 2005.
  3. Filippone, M., Camastra, F., Masulli, F., Rovetta, S., A survey of kernel and spectral methods for clustering, Pattern Recognition, 41(1): 176-190, 2008.
  4. Fowlkes, E.B., Mallows, C.L., A method for comparing two hierarchical clusterings, Journal of the American Statistical Association, 78: 553-569, 1983.
  5. Frank, A., Asuncion, A., UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], 2010. Irvine, CA: University of California, School of Information and Computer Science.
  6. Gil-García, R., Pons-Porrata, A., Dynamic hierarchical algorithms for document clustering, Pattern Recognition Letters, 31: 469-477, 2010.
  7. Gómez, J., Dasgupta, D., Nasraoui, O., A new gravitational clustering algorithm, In Proceedings of the 3rd SIAM International Conference on Data Mining, pages 83-94, San Francisco, CA, USA, May 1-3, 2003.
  8. Jain, A.K., Data clustering: 50 years beyond K-means, Pattern Recognition Letters, 31: 651-666, 2010.
  9. Li, J., Fu, H., Molecular dynamics-like data clustering approach, Pattern Recognition, 44: 1721-1737, 2011.
  10. Lu, Y., Wan, Y., PHA: a fast potential-based hierarchical agglomerative clustering method, Pattern Recognition, 46(5): 1227-1239, 2013.
  11. Lu, Y., Wan, Y., Clustering by sorting potential values (CSPV): a novel potential-based clustering method, Pattern Recognition, 45(9): 3512-3522, 2012.
  12. Murtagh, F., Contreras, P., Algorithms for hierarchical clustering: an overview, WIREs Data Mining and Knowledge Discovery, 2: 86-97, 2012.
  13. Omran, M.G., Engelbrecht, A.P., Salman, A., An overview of clustering methods, Intelligent Data Analysis, 11(6): 583-605, 2007.
  14. Peng, L., Yang, B., Chen, Y., Abraham, A., Data gravitation based classification. Information Sciences, 179(6): 809-819, 2009.
  15. Shi, S., Yang, G., Wang, D., Zheng, W., Potential-based hierarchical clustering, In Proceedings of the 16th International Conference on Pattern Recognition, pages 272-275, Quebec, Canada, August 11-15, 2002.
  16. Wang, J., Li, M., Chen, J., Pan, Y., A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks, IEEE Transactions on Computational Biology and Bioinformatics, 8(3): 607- 620, 2011.
  17. Wright, W.E., Gravitational clustering, Pattern Recognition, 9: 151-166, 1977.
  18. Yu, H., Gerstein, M., Genomic analysis of the hierarchical structure of regulatory networks, Proc. National Academy of Sciences of USA, 103(40): 14724-14731, October, 2006.
Download


Paper Citation


in Harvard Style

Lu Y., Hou X. and Chen X. (2014). Measuring Cluster Similarity by the Travel Time between Data Points . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 14-20. DOI: 10.5220/0004761800140020


in Bibtex Style

@conference{icpram14,
author={Yonggang Lu and Xiaoli Hou and Xurong Chen},
title={Measuring Cluster Similarity by the Travel Time between Data Points},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={14-20},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004761800140020},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Measuring Cluster Similarity by the Travel Time between Data Points
SN - 978-989-758-018-5
AU - Lu Y.
AU - Hou X.
AU - Chen X.
PY - 2014
SP - 14
EP - 20
DO - 10.5220/0004761800140020