EVALUATION OF NEGENTROPY-BASED CLUSTER VALIDATION TECHNIQUES IN PROBLEMS WITH INCREASING DIMENSIONALITY

L. F. Lago-Fernández, G. Martínez-Muñoz, A. M. González, M. A. Sánchez-Montañés

Abstract

The aim of a crisp cluster validity index is to quantify the quality of a given data partition. It allows to select the best partition out of a set of potential ones, and to determine the number of clusters. Recently, negentropy based cluster validation has been introduced. This new approach seems to perform better than other state of the art techniques, and its computation is quite simple. However, like many other cluster validation approaches, it presents problems when some partition regions have a small number of points. Different heuristics have been proposed to cope with this problem. In this article we systematically analyze the performance of different negentropy-based validation approaches, including a new heuristic, in clustering problems of increasing dimensionality, and compare them to reference criteria such as AIC and BIC. Our results on synthetic data suggest that the newly proposed negentropy-based validation strategy can outperform AIC and BIC when the ratio of the number of points to the dimension is not high, which is a very common situation in most real applications.

References

  1. Abramowitz, M. and Stegun, A. (1965). Handbook of mathematical functions. Dover, New York.
  2. Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automatic Control, 19:716- 723.
  3. Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analysis Machine Intelligence, 22(7):719-725.
  4. Breiman, L. (1996). Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, Berkeley, CA, USA.
  5. Fraley, C. and Raftery, A. (1998). How many clusters? which clustering method? answers via model-based cluster analysis. Technical Report 329, Department of Statistics, University of Washington, Seattle, WA, USA.
  6. Gordon, A. (1998). Cluster validation. In Hayashi, C., Ohsumi, N., Yajima, K., Tanaka, Y., Bock, H., and Baba, Y., editors, Data science, classification and related methods, pages 22-39. Springer.
  7. Lago-Fernández, L. F. and Corbacho, F. (2010). Normalitybased validation for crisp clustering. Pattern Recognition, 43(3):782-795.
  8. Lago-Fernández, L. F., Sánchez-Montanés, M. A., and Corbacho, F. (2009). Fuzzy cluster validation using the partition negentropy criterion. Lecture Notes in Computer Science, 5769:235-244.
  9. Lago-Fernández, L. F., Sánchez-Montanés, M. A., and Corbacho, F. (2011). The effect of low number of points in clustering validation via the negentropy increment. Neurocomputing, 74(16):2657-2664.
  10. Misra, N., Singh, H., and Demchuk, E. (2005). Estimation of the entropy of a multivariate normal distribution. Journal of Multivariate Analysis, 92(2):324-342.
  11. Schwartz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6:461-464.
  12. Xu, R. and II, D. W. (2005). Survey of clustering algorithms. IEEE Trans. Neural Networks, 16(3):645-678.
Download


Paper Citation


in Harvard Style

F. Lago-Fernández L., Martínez-Muñoz G., M. González A. and A. Sánchez-Montañés M. (2012). EVALUATION OF NEGENTROPY-BASED CLUSTER VALIDATION TECHNIQUES IN PROBLEMS WITH INCREASING DIMENSIONALITY . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-8425-98-0, pages 235-241. DOI: 10.5220/0003793602350241


in Bibtex Style

@conference{icpram12,
author={L. F. Lago-Fernández and G. Martínez-Muñoz and A. M. González and M. A. Sánchez-Montañés},
title={EVALUATION OF NEGENTROPY-BASED CLUSTER VALIDATION TECHNIQUES IN PROBLEMS WITH INCREASING DIMENSIONALITY},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2012},
pages={235-241},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003793602350241},
isbn={978-989-8425-98-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - EVALUATION OF NEGENTROPY-BASED CLUSTER VALIDATION TECHNIQUES IN PROBLEMS WITH INCREASING DIMENSIONALITY
SN - 978-989-8425-98-0
AU - F. Lago-Fernández L.
AU - Martínez-Muñoz G.
AU - M. González A.
AU - A. Sánchez-Montañés M.
PY - 2012
SP - 235
EP - 241
DO - 10.5220/0003793602350241