Automatic Subspace Clustering with Density Function

Jiwu Zhao, Stefan Conrad

Abstract

Clustering techniques in data mining aim to find interesting patterns in data sets. However, traditional clustering methods are not suitable for large, high-dimensional data. Subspace clustering is an extension of traditional clustering that enables finding clusters in subspaces within a data set, which means subspace clustering is more suitable for detecting clusters in high-dimensional data sets. However, most subspace clustering methods usually require many complicated parameter settings, which are always troublesome to determine, and therefore there are many limitations for applying these subspace clustering methods. In this article, we develop a novel subspace clustering method with a new density function, which computes and represents the density distribution directly in high-dimensional data sets, and furthermore the new method requires as few parameters as possible.

References

  1. Aggarwal, C. C., Wolf, J. L., Yu, P. S., Procopiuc, C., and Park, J. S. (1999). Fast algorithms for projected clustering. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data, SIGMOD 7899, pages 61-72. ACM.
  2. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data, SIGMOD 7898, pages 94-105. ACM.
  3. Cheng, C.-H., Fu, A. W., and Zhang, Y. (1999). Entropybased subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7899, pages 84-93. ACM.
  4. Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226-231.
  5. Frank, A. and Asuncion, A. (2010). UCI machine learning repository. [http://archive.ics.uci.edu/ml]. University of California, Irvine, School of Information and Computer Sciences.
  6. Goil, S., Nagesh, H., and Choudhary, A. (1999). Mafia: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906- 010, Northwestern University.
  7. Hinneburg, A. and Gabriel, H.-H. (2007). Denclue 2.0: fast clustering based on kernel density estimation. In Proceedings of the 7th international conference on Intelligent data analysis, IDA'07, pages 70-80. SpringerVerlag.
  8. Hinneburg, A., Hinneburg, E., and Keim, D. A. (1998). An efficient approach to clustering in large multimedia databases with noise. In Proc. 4rd Int. Conf. on Knowledge Discovery and Data Mining, pages 58-65. AAAI Press.
  9. Kriegel, H.-P., Kröger, P., and Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transactions on Knowledge Discovery from Data, 3:1:1-1:58.
  10. Kröger, P., Kriegel, H.-P., and Kailing, K. (2004). Densityconnected subspace clustering for high-dimensional data. In Proc. SIAM Int. Conf. on Data Mining (SDM'04), pages 246-257.
  11. MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297. University of California Press.
  12. Parsons, L., Haque, E., and Liu, H. (2004). Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl., 6:90-105.
  13. Woo, K.-G., Lee, J.-H., Kim, M.-H., and Lee, Y.-J. (2004). Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology, 46(4):255-271.
  14. Yang, J., Wang, W., Wang, H., and Yu, P. (2002). d-clusters: Capturing subspace correlation in a large data set. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 517 -528.
  15. Zhao, J. (2010). Automatic parameter determination in subspace clustering with gravitation function. In Proceedings of the Fourteenth International Database Engineering and Applications Symposium, IDEAS 7810, pages 130-135. ACM.
Download


Paper Citation


in Harvard Style

Zhao J. and Conrad S. (2012). Automatic Subspace Clustering with Density Function . In Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA, ISBN 978-989-8565-18-1, pages 63-69. DOI: 10.5220/0004031400630069


in Bibtex Style

@conference{data12,
author={Jiwu Zhao and Stefan Conrad},
title={Automatic Subspace Clustering with Density Function},
booktitle={Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,},
year={2012},
pages={63-69},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004031400630069},
isbn={978-989-8565-18-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Data Technologies and Applications - Volume 1: DATA,
TI - Automatic Subspace Clustering with Density Function
SN - 978-989-8565-18-1
AU - Zhao J.
AU - Conrad S.
PY - 2012
SP - 63
EP - 69
DO - 10.5220/0004031400630069