Pattern Recognition by Probabilistic Neural Networks - Mixtures of Product Components versus Mixtures of Dependence Trees

Jiri Grim, Pavel Pudil

Abstract

We compare two probabilistic approaches to neural networks - the first one based on the mixtures of product components and the second one using the mixtures of dependence-tree distributions. The product mixture models can be efficiently estimated from data by means of EM algorithm and have some practically important properties. However, in some cases the simplicity of product components could appear too restrictive and a natural idea is to use a more complex mixture of dependence-tree distributions. By considering the concept of dependence tree we can explicitly describe the statistical relationships between pairs of variables at the level of individual components and therefore the approximation power of the resulting mixture may essentially increase. Nonetheless, in application to classification of numerals we have found that both models perform comparably and the contribution of the dependence-tree structures decreases in the course of EM iterations. Thus the optimal estimate of the dependence-tree mixture tends to converge to a simple product mixture model. Regardless of computational aspects, the dependence-tree mixtures could help to clarify the role of dendritic branching in the highly selective excitability of neurons.

References

  1. Boruvka, O. (1926). On a minimal problem, Transaction of the Moravian Society for Natural Sciences (in czech), No. 3.
  2. Bouguila, N., Ziou, D. and Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Trans. on Image Processing, Vol. 13, No. 11, pp. 1533-1543.
  3. Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees, IEEE Trans. on Information Theory, Vol. IT-14, No.3, pp. 462- 467.
  4. Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc., B, Vol. 39, pp. l-38.
  5. Day, N.E. (1969). Estimating the components of a mixture of normal distributions, Biometrika, Vol. 56, pp. 463- 474.
  6. Grim, J. (1982). On numerical evaluation of maximum - likelihood estimates for finite mixtures of distributions, Kybernetika, Vol.l8, No.3, pp.173-190. http://dml.cz/dmlcz/124132
  7. Grim, J. (1984). On structural approximating multivariate discrete probability distributions, Kybernetika, Vol. 20, No. 1, pp. 1-17. http://dml.cz/dmlcz/125676
  8. Grim, J. (1986). Multivariate statistical pattern recognition with nonreduced dimensionality, Kybernetika, Vol. 22, No. 2, pp. 142-157. http://dml.cz/dmlcz/125022
  9. Grim, J. (1996). Design of multilayer neural networks by information preserving transforms. In Third European Congress on Systems Science. (Eds. Pessa E., Penna M. P., Montesanto A.). (Edizioni Kappa, Roma 1996) 977-982.
  10. Grim, J. (1999). Information approach to structural optimization of probabilistic neural networks, In Proc. 4th System Science European Congress, Eds. Ferrer, L. et al., Valencia: Soc. Espanola de Sistemas Generales, pp. 527-540.
  11. Grim, J. (1999b). A sequential modification of EM algorithm, In Studies in Classification, Data Analysis and Knowledge Organization, Eds. Gaul W., LocarekJunge H., Springer 1999, pp. 163 - 170.
  12. Grim, J. (2006). EM cluster analysis for categorical data, In Structural, Syntactic and Statistical Pattern Recognition. Eds. Yeung D. Y., Kwok J. T., Fred A., Springer: Berlin, LNCS 4109, pp. 640-648.
  13. Grim, J. (2007). Neuromorphic features of probabilistic neural networks. Kybernetika, Vol. 43, No. 5, pp.697- 712. http://dml.cz/dmlcz/135807
  14. Grim, J. (2014). Sequential pattern recognition by maximum conditional informativity, Pattern Recognition Letters, Vol. 45C, pp. 39-45. http:// dx.doi.org/10.1016/j.patrec.2014.02.024
  15. Grim, J., Haindl, M., Somol, P. and P. Pudil (2006). A subspace approach to texture modelling by using Gaussian mixtures, In Proceedings of the 18th IAPR International Conference on Pattern Recognition ICPR 2006, Eds. B. Haralick, T.K. Ho, Los Alamitos, IEEE Computer Society, pp. 235-238.
  16. Grim, J. and Hora, J. (2008). Iterative principles of recognition in probabilistic neural networks, Neural Networks. Vol. 21, No. 6, pp. 838-846.
  17. Grim, J. and Hora, J. (2009). Recognition of Properties by Probabilistic Neural Networks, In Artificial Neural Networks - ICANN 2009, Springer: Berlin, LNCS 5769, pp. 165-174.
  18. Grim, J. and Hora, J. (2010). Computational Properties of Probabilistic Neural Networks, In Artificial Neural Networks - ICANN 2010 Part II, Springer: Berlin, LNCS 5164, pp. 52-61.
  19. Grim, J., Hora, J., Boc?ek P., Somol, P. and Pudil, P. (2010). Statistical Model of the 2001 Czech Census for Interactive Presentation, Journal of Official Statistics. Vol. 26, No. 4, pp. 673694. http://ro.utia.cas.cz/dem.html
  20. Grim, J., Kittler, J., Pudil, P. and Somol, P. (2002). Multiple classifier fusion in probabilistic neural networks, Pattern Analysis and Applications, Vol. 5, No. 7, pp. 221-233.
  21. Grim, J., Pudil, P. and Somol, P. (2000). Recognition of handwritten numerals by structural probabilistic neural networks, In Proceedings of the Second ICSC Symposium on Neural Computation, Berlin, 2000. (Bothe H., Rojas R. eds.). ICSC, Wetaskiwin, pp. 528-534.
  22. Grim, J., Pudil, P. and Somol, P. (2002b). Boosting in probabilistic neural networks, In Proceedings of the 16th International Conference on Pattern Recognition, (Kasturi R., Laurendeau D., Suen C. eds.). IEEE Computer Society, Los Alamitos, pp. 136-139.
  23. Grim, J., Somol, P., Haindl, M. and Danes?, J. (2009). Computer-Aided Evaluation of Screening Mammograms Based on Local Texture Models, IEEE Trans. on Image Processing, Vol. 18, No. 4, pp. 765-773.
  24. Hasselblad, V. (1966). Estimation of prameters for a mixture of normal distributions, Technometrics, Vol. 8, pp. 431-444.
  25. Hasselblad, V. (1969). Estimation of finite mixtures of distributions from the exponential family, Journal of Amer. Statist. Assoc., Vol. 58, pp. 1459-1471.
  26. Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory, (New York: Wiley 1949).
  27. Hosmer Jr, D.W. (1973). A comparison of iterative maximum likelihood estimates of the parameters of a mixture of two normal distributions under three different types of sample, Biometrics, pp. 761-770.
  28. Kirshner, S. and Smyth, P. (2007). Infinite mixtures of trees, In Proceedings of the 24th International Conference on Machine Learning (ICML'07), Ed. Zoubin Ghahramani, ACM, New York, USA, pp. 417-423.
  29. Kruskal, J.B. (1956). On the shortest spanning sub-tree of a graph, Proc. Amer. Math. Soc., No. 7, pp. 48-50.
  30. Kullback, S. and Leibler, R.A. (1951). On Information and Sufficiency, The Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79-86.
  31. Lowd, D. and Domingos, P. (2005). Naive Bayes models for probability estimation, In Proceedings of the 22nd international conference on machine learning, ACM 2005, pp. 529-536.
  32. Markley, S.C. and Miller, D.J. (2010). Joint parsimonious modeling and model order selection for multivariate Gaussian mixtures, IEEE Journal of Selected Topics in Signal Processing, Vol. 4, No. 3, pp. 548-559.
  33. Meila, M. and Jordan, M.I. (1998). Estimating dependency structure as a hidden variable, In Proceedings of the 1997 Conference on advances in neural information processing systems 10, pp. 584-590.
  34. Meila, M. and Jaakkola T. (2000). Tractable Bayesian Learning of Tree Belief Networks, In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 380-388.
  35. Meila, M. and Jordan, M.I. (2001). Learning with mixtures of trees, Journal of Machine Learning Research, Vol. 1, No. 9, pp. 1-48.
  36. Prim, R.C. (1957). Shortest connection networks and some generalizations, Bell System Tech. J., Vol. 36 , pp. 1389-1401.
  37. Schlesinger, M.I. (1968). Relation between learning and self learning in pattern recognition, (in Russian), Kibernetika, (Kiev), No. 2, pp. 81-88.
  38. Vajda, I. Theory of statistical inference and information, Kluwer Academic Publishers (Dordrecht and Boston), 1989.
  39. Wolfe, J.H. (1970). Pattern clustering by multivariate mixture analysis, Multivariate Behavioral Research, Vol. 5, pp. 329-350.
Download


Paper Citation


in Harvard Style

Grim J. and Pudil P. (2014). Pattern Recognition by Probabilistic Neural Networks - Mixtures of Product Components versus Mixtures of Dependence Trees . In Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014) ISBN 978-989-758-054-3, pages 65-75. DOI: 10.5220/0005077500650075


in Bibtex Style

@conference{ncta14,
author={Jiri Grim and Pavel Pudil},
title={Pattern Recognition by Probabilistic Neural Networks - Mixtures of Product Components versus Mixtures of Dependence Trees},
booktitle={Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)},
year={2014},
pages={65-75},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005077500650075},
isbn={978-989-758-054-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Neural Computation Theory and Applications - Volume 1: NCTA, (IJCCI 2014)
TI - Pattern Recognition by Probabilistic Neural Networks - Mixtures of Product Components versus Mixtures of Dependence Trees
SN - 978-989-758-054-3
AU - Grim J.
AU - Pudil P.
PY - 2014
SP - 65
EP - 75
DO - 10.5220/0005077500650075