UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY

Gilles Bougenière, Claude Cariou, Kacem Chehdi, Alan Gay

Abstract

In this communication, we propose a novel approach to perform the unsupervised and non parametric clustering of n-D data upon a Bayesian framework. The iterative approach developed is derived from the Classification Expectation-Maximization (CEM) algorithm, in which the parametric modelling of the mixture density is replaced by a non parametric modelling using local kernels, and the posterior probabilities account for the coherence of current clusters through the measure of class-conditional entropies. Applications of this method to synthetic and real data including multispectral images are presented. The classification issues are compared with other recent unsupervised approaches, and we show that our method reaches a more reliable estimation of the number of clusters while providing slightly better rates of correct classification in average.

References

  1. Aeberhard, S., Coomans, D., and deVel, O. (1992). The classification performance of RDA. Technical report, 92-01, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University, North Queensland, Australia.
  2. Bezdek, J. (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York.
  3. Cariou, C., Chehdi, K., and Nagle, A. (2005). Gravitational transform for data clustering - application to multicomponent image classification. In Proc. IEEE ICASSP 2005, volume 2, pages 105-108, Philadelphia, USA.
  4. Celeux, G. and Diebolt, J. (1987). A probabilistic teacher algorithm for iterative maximum likelihood estimation. In Classification and Related Methods of Data Analysis, pages 617-623. Amsterdam: Elsevier, North-Holland.
  5. Celeux, G. and Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. In Computational Statistics and Data Analysis, number 3, pages 315-332.
  6. Dempster, A., Laird, N., and Rubin, D. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38.
  7. Dunn, J. (1973). A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3(3):32-57.
  8. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179- 188.
  9. Gustafson, D. and Kessel, W. (1979). Fuzzy clustering with a covariance matrix. IEEE Conference on Decision and Control, pages 761-766.
  10. Huang, J. and Ng, M. (2005). Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):657-668.
  11. Laszlo, M. and Mukherjee, S. (2006). A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):533-543.
  12. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1:281-297.
  13. Masson, P. and Pieczynski, W. (1993). SEM algorithm and unsupervised statistical segmentation of satellite images. IEEE Transactions on Geoscience and Remote Sensing, 31(3):618-633.
  14. Same, A., Govaert, G., and Ambroise, C. (2005). A mixture model-based on-line cem algorithm. In Advances in Intelligent Data Analysis, 6th International Symposium on Data Analysis, IDA 2005, 8-10 Oct. 2005, Madrid, Spain.
  15. Tran, T., Wehrens, R., and Buydens, L. (2005). Clustering multispectral images: a tutorial. Chemometrics and Intelligent Laboratory Systems, 77:1-2.
  16. Tran, T., Wehrens, R., and Buydens, L. (2006). KNN-kernel density-based clustering for high-dimensional multivariate data. Computational Statistics and Data Analysis, 51:513-525.
  17. Zribi, M. and Ghorbel, F. (2003). An unsupervised and non-parametric bayesian classifier. Pattern Recognition Letters, 24(1):97 - 112.
Download


Paper Citation


in Harvard Style

Bougenière G., Cariou C., Chehdi K. and Gay A. (2007). UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 101-108. DOI: 10.5220/0002141301010108


in Bibtex Style

@conference{sigmap07,
author={Gilles Bougenière and Claude Cariou and Kacem Chehdi and Alan Gay},
title={UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={101-108},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002141301010108},
isbn={978-989-8111-13-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY
SN - 978-989-8111-13-5
AU - Bougenière G.
AU - Cariou C.
AU - Chehdi K.
AU - Gay A.
PY - 2007
SP - 101
EP - 108
DO - 10.5220/0002141301010108