Comparison of Various Definitions of Proximity in Mixture Estimation

Ivan Nagy, Evgenia Suzdaleva, Pavla Pecherková

Abstract

Classification is one of the frequently demanded tasks in data analysis. There exists a series of approaches in this area. This paper is oriented towards classification using the mixture model estimation, which is based on detection of density clusters in the data space and fitting the component models to them. A chosen function of proximity of the actually measured data to individual mixture components and the component shape play a significant role in solving the mixture-based classification task. This paper considers definitions of the proximity for several types of distributions describing the mixture components and compares their properties with respect to speed and quality of the resulting estimation interpreted as a classification task. Normal, exponential and uniform distributions as the most important models used for describing both Gaussian and non-Gaussian data are considered. Illustrative experiments with results of the comparison are provided.

References

  1. Yu, J. (2012). A nonlinear kernel Gaussian mixture model based inferential monitoring approach for fault detection and diagnosis of chemical processes, Chemical Engineering Science, vol. 68, 1, p. 506-519.
  2. Yu, J. (2012). A particle filter driven dynamic Gaussian mixture model approach for complex process monitoring and fault diagnosis, Journal of Process Control, vol. 22, 4 , p. 778-788.
  3. Yu, Jianbo. (2011). Fault detection using principal components-based Gaussian mixture model for semiconductor manufacturing processes, IEEE Transactions on Semiconductor Manufacturing, vol. 24, 3, p. 432-444.
  4. Larose, D. T. (2005). Discovering Knowledge in Data. An Introduction to Data Mining. Willey.
  5. Han, J., Kamber, M., Pei, J. (2011). Data Mining: Concepts and Techniques, 3rd ed. (The Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann.
  6. Zaki, M.J., Meira Jr.. W. (2014). Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press.
  7. Calders, T., Verwer, S. (2010). Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery. 21(2), p. 277-292.
  8. Zhang, G. P. (2000). Neural Networks for Classification: A Survey. In: IEEE Transactions on System, Man, and Cybernetics - Part C: Applications and Reviews. 30(4), November, p. 451-462.
  9. Ishibuchi, H., Nakashima, T., Nii, M. (2000). Fuzzy If-Then Rules for Pattern Classification. In: The Springer International Series in Engineering and Computer Science. 553, p. 267-295.
  10. Berkhin, P. (2006). A Survey of Clustering Data Mining Techniques. In: Grouping Multidimensional Data. Eds.: J. Kogan, C. Nicholas, M. Teboulle. Springer Berlin Heidelberg, p.25-71.
  11. Jain, A. K. (2010). Data clustering: 50 years beyond Kmeans. Pattern Recognition Letters. 31(8), p. 651- 666.
  12. Ester, M., Kriegel, H.-P., Sander, J., Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases. In: Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD'96), Portland, OR, August, p. 226-231.
  13. Bouveyron, C., Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis. 71(0), p. 52-78.
  14. Zeng, H., Cheung, Y. (2014). Learning a mixture model for clustering with the completed likelihood minimum message length criterion. Pattern Recognition. 47(5), p. 2011-2030.
  15. Ng, S.K., McLachlan, G.J. (2014). Mixture models for clustering multilevel growth trajectories. Computational Statistics & Data Analysis. 71(0), p. 43-51.
  16. Gupta, M. R. , Chen, Y. (2011). Theory and use of the EM method. In: Foundations and Trends in Signal Processing, vol. 4, 3, p. 223-296.
  17. Boldea, O., Magnus, J. R. (2009). Maximum likelihood estimation of the multivariate normal mixture model, Journal of The American Statistical Association, vol. 104, 488, p. 1539-1549.
  18. Wang, H.X., Luo, B., Zhang, Q. B., Wei, S. (2004). Estimation for the number of components in a mixture model using stepwise split-and-merge EM algorithm, Pattern Recognition Letters, vol. 25, 16, p. 1799-1809.
  19. McGrory, C. A., Titterington, D. M. (2009). Variational Bayesian analysis for hidden Markov models, Australian & New Zealand Journal of Statistics, vol. 51, p. 227-244.
  20. S?mídl, V., Quinn, A. (2006). The Variational Bayes Method in Signal Processing, Springer-Verlag Berlin Heidelberg.
  21. Frühwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models, Springer-Verlag New York.
  22. Doucet, A., Andrieu, C. (2001). Iterative algorithms for state estimation of jump Markov linear systems. IEEE Transactions on Signal Processing, vol. 49, 6, p. 1216-1227.
  23. Chen, R., Liu, J.S. (2000). Mixture Kalman filters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 62, p. 493-508.
  24. KárnÉ, M., Kadlec, J., Sutanto, E.L. (1998). Quasi-Bayes estimation applied to normal mixture, In: Preprints of the 3rd European IEEE Workshop on ComputerIntensive Methods in Control and Data Processing (eds. J. Rojí c?ek, M. Valec?ková, M. KárnÉ, K. Warwick), CMP'98 /3./, Prague, CZ, p. 77-82.
  25. Peterka, V. (1981). Bayesian system identification. In: Trends and Progress in System Identification (ed. P. Eykhoff), Oxford, Pergamon Press, 1981, p. 239-304.
  26. KárnÉ, M., Böhm, J., Guy, T. V., Jirsa, L., Nagy, I., Nedoma, P., Tesa?r, L. (2006). Optimized Bayesian Dynamic Advising: Theory and Algorithms, SpringerVerlag London.
  27. Nagy, I., Suzdaleva, E., KárnÉ, M., Mlyná?rová, T. (2011). Bayesian estimation of dynamic finite mixtures. Int. Journal of Adaptive Control and Signal Processing, vol. 25, 9, p. 765-787.
  28. Suzdaleva, E., Nagy, I., Mlyná?rová, T. (2015). Recursive Estimation of Mixtures of Exponential and Normal Distributions. In: Proceedings of the 8th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Warsaw, Poland, September 24-26, p.137-142.
  29. Yang, L., Zhou, H., Yuan, H. (2013). Bayes Estimation of Parameter of Exponential Distribution under a Bounded Loss Function. Research Journal of Mathematics and Statistics, vol.5, 4, p.28-31.
  30. Casella, G., Berger R.L. (2001). Statistical Inference, 2nd ed., Duxbury Press.
  31. Nagy, I., Suzdaleva, E., Mlyná?rová, T. (2016). Mixturebased clustering non-gaussian data with fixed bounds. In: Proceedings of the IEEE International conference Intelligent systems IS'16, Sofia, Bulgaria, September 4-6, accepted.
Download


Paper Citation


in Harvard Style

Nagy I., Suzdaleva E. and Pecherková P. (2016). Comparison of Various Definitions of Proximity in Mixture Estimation . In Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-758-198-4, pages 527-534. DOI: 10.5220/0005982805270534


in Bibtex Style

@conference{icinco16,
author={Ivan Nagy and Evgenia Suzdaleva and Pavla Pecherková},
title={Comparison of Various Definitions of Proximity in Mixture Estimation},
booktitle={Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2016},
pages={527-534},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005982805270534},
isbn={978-989-758-198-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 13th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Comparison of Various Definitions of Proximity in Mixture Estimation
SN - 978-989-758-198-4
AU - Nagy I.
AU - Suzdaleva E.
AU - Pecherková P.
PY - 2016
SP - 527
EP - 534
DO - 10.5220/0005982805270534