Computational Models of Object Recognition - Goal, Role and Success

Tayyaba Azim

2014

Abstract

This paper surveys the learning algorithms of visual features representation and the computational modelling approaches proposed with the aim of developing better artificial object recognition systems. It turns out that most of the learning theories and schemas have been developed either in the spirit of understanding biological facts of vision or designing machines that provide better or competitive perception power than humans. In this study, we discuss and analyse the impact of notable statistical approaches that map the cognitive neural activity at macro level formally, as well as those that work independently without any biological inspiration towards the goal of developing better classifiers. With the ultimate objective of classification in hand, the dimensions of research in computer vision and AI in general, have expanded so much so that it has become important to understand if our goals and diagnostics of the visual input learning are correct or not. We first highlight the mainstream approaches that have been proposed to solve the classification task ever since the advent of the field, and then suggest some criterion of success that can guide the direction of the future research.

References

  1. Aggarwal, J., Ghosh, J., Nair, D., and Taha, I. (1996). A comparative study of three paradigms for object recognition - bayesian statistics, neural networks and expert systems. In Image Understanding: A Festschrift for Azriel Rosenfeld, pages 241-262. Society Press.
  2. Amit, D. and Brunel, N. (1997). Model of global spontaneous activity and local structured activity during delay periods in the cerebral corte. Cereb. Cortex, 7:237-252.
  3. Bell, A. and Sejnowski, T. (1997). The 'Independent Components' of natural scenes are edge filters. Vision Research, 37:3327-3338.
  4. Biederman, I. (1986). Human image understanding: recent research and a theory. In Second workshop on Human and Machine Vision II, pages 13-57.
  5. Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological Review, 94(2):115-147.
  6. Biederman, I. and Cooper, E. (1991). Priming contourdeleted images: Evidence for intermediate representations in visual object recognition. Cognitive Psychology, 23(3):393 - 419.
  7. Bileschi, S. (2006). Streetscenes: Towards Scene Understanding in Still Images. PhD thesis, MIT.
  8. Brunel, N. (2000). Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Comput. Neurosci., 8:183-208.
  9. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, (ECCV), pages 1-22.
  10. Csurka, G. and Perronnin, F. (2011). Fisher vectors: Beyond bag-of-visual-words image representations. In Computer Vision, Imaging and Computer Graphics. Theory and Applications, volume 229, pages 28-42.
  11. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR, pages 886- 893.
  12. Deco, G. and Rolls, E. (2006). Decision-making and webers law: a neurophysiological model. European Journal of Neuroscience, 24:901-916.
  13. Erhan, D., Bengio, Y., Courville, A., Manzagol, P., Vincent, P., and Bengio, S. (2010). Why does unsupervised pre-training help deep learning? JMLR, 11:625-660.
  14. Fukushima, K. (1988). Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Networks, 1(2):119-13.
  15. Fulcher, E. (2003). Cognitive Psychology. NY, 1st edition.
  16. Grimson, W. (1981). A computer implementation of a theory of human stereo vision. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 292(1058):217-253.
  17. Hertz, J., Lerchner, A., and Ahmadi, M. (2004). Mean field methods for cortical network dynamics. pages 71-89. Springer-Verlag.
  18. Hinton, G., Dayan, P., Frey, B., and Neal, R. M. (1995). The wake-sleep algorithm for self-organizing neural networks. Science.
  19. Hinton, G. and Nair, V. (2009). 3D object recognition with Deep Belief Nets. In NIPS.
  20. Hinton, G., Osindero, S., and Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554.
  21. Hinton, G. and Salakhutdinov, R. (2009). Semantic hashing. Approximate Reasoning, 50(7):969-978.
  22. Hubel, D. and Wiesel, T. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. Physiology, 160:106-154.
  23. Hubel, D. and Wiesel, T. (1965). Receptive fields and functional architecture in two non-striate visual areas (18 and 19) of the cat. Neurophysiology, 18:229-289.
  24. Jaakkola, T. and Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS, pages 487-493.
  25. Karklin, Y. and Lewicki, M. (2005). A hierarchical Bayesian model for learning non-linear statistical regularities in non-stationary natural signals. Neural Computation, 17(2):397-423.
  26. Karklin, Y. and Lewicki, M. (2009). Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457(2):83-86.
  27. Kreiman, G. (2008). Biological object recognition. Scholarpedia, 3(6):26-67.
  28. Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. NIPS.
  29. Lades, M., Vorbruggen, J., Buhmann, J., Lange, J., Malsburg, C., Wurtz, R., and Konen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. Computers, 42(3).
  30. Larochelle, H. and Bengio, Y. (2008). Classification using discriminative restricted boltzmann machines. In ICML.
  31. Leibe, B. and Schiele, B. (2003). Interleaved object categorization and segmentation. In BMVC, pages 759-768.
  32. Li, T., Mei, T., and Kweon., I. (2008). Learning optimal compact codebook for efficient object categorization. In IEEE Workshop on ACV, pages 1-6.
  33. Linsker, R. (1992). Local synaptic learning rules suffice to maximise mutual information in a linear network. Neural Computation, 4:691-702.
  34. Logothetis, N. and Sheinberg, D. (1996). Visual object recognition. Annual Review Neuroscience, 19:577- 621.
  35. Lowe, D. (1999). Object recognition from local scaleinvariant features. In ICCV, volume 2, pages 1150- 1157.
  36. Lowe, D. (2004). Distinctive image features from scaleinvariant keypoints. IJCV, 60(2):91-110.
  37. Marr, D. and Nishihara, H. (1978). Representation and recognition of the spatial organization of threedimensional shapes. 200(1140):269-294.
  38. Marr, D. and Poggio, T. (1979). A computational theory of human stereo vision. Proceedings of the Royal Society of London. Series B, Biological Sciences, 204(1156):301-328.
  39. Matas, J. and Obdrzalek, S. (2004). Object recognition methods based on transformation covariant features. In EUSIPCO.
  40. Murase, H. and Nayar, S. (1995). Visual learning and recognition of 3-d objects from appearance. Computer Vision, 14(1):5-24.
  41. Nevatia, K. and Binford, T. (1973). Structured descriptions of complex objects. In IJCAI, pages 641-647.
  42. Nishimoto, S., Vu, A., Naselaris, T., Benjamini, Y., Yu, B., and Gallant, J. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19):1641-1646.
  43. Ojala, T., Pietikainen, M., and Harwood, D. (1994). Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In ICPR, volume 1, pages 582-585.
  44. Parikh, D. and Zitnick, C. (2010). The role of features, algorithms and data in visual recognition. In CVPR, pages 2328 -2335.
  45. Perrett, D. and Oram, M. (1993). Neurophysiology of shape processing. IVC, 11:317-333.
  46. Perronnin, F. and Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR, pages 1-8.
  47. Perronnin, F., Snchez, J., and Mensink, T. (2010). Improving the fisher kernel for large-scale image classification. In ECCV.
  48. Pinto, N., Cox, D., and DiCarlo, J. (2008). Why is realworld visual object recognition hard? PLoS Computational Biology, 41(1).
  49. Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B., Torralba, A., Williams, C., Zhang, J., and Zisserman, A. (2006). Dataset issues in object recognition. volume 4170, pages 29-48. Springer Verlag.
  50. Ramanan, A. and Niranjan, M. (2010). A One-pass Resource-Allocating Codebook for patch-based visual object recognition. In MLSP.
  51. Reichert, D., Series, P., and Storkey, A. (2011a). Hallucinations in charles bonnet syndrome induced by homeostasis: a deep boltzmann machine model. In NIPS, volume 23, pages 2020-2028.
  52. Reichert, D., Series, P., and Storkey, A. (2011b). A hierarchical generative model of recurrent object-based attention in the visual cortex. In Artificial neural networks (ANN), ICANN, pages 18-25.
  53. Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2:1019-1025.
  54. Rolls, E., Loh, M., Deco, G., and Winterer, G. (2008- 2009). Computational models of schizophrenia and dopamine modulation in the prefrontal cortex. Nature Rev. Neurosci., 9(9):696-709.
  55. Rumelhart, D., Hinton, G., and Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323(6088):533-536.
  56. Sanchez, J. and Perronnin, F. (2011). High-dimensional signature compression for large-scale image classification. In CVPR, pages 1665 -1672.
  57. Sanchez, J., Perronnin, F., Mensink, T., and Verbeek, J. (2013). Image Classification with the FV: Theory & Practice. Technical Report RR-8209, INRIA.
  58. Sejnowsky, T. (1976). On global properties of neuronal interaction. Biol. Cybern, 22:85-95.
  59. Serre, T., Kreiman, G., Kouh, M., Cadieu, C., Knoblich, U., and Poggio, T. (2007a). A quantitative theory of immediate visual recognition. Progress in Brain Research, 165:33-56.
  60. Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., and Poggio, T. (2007b). Robust object recognition with cortex-like mechanisms. PAMI, 29:411-426.
  61. Sugase, Y., Yamane, S., Ueno, S., and Kawano, K. (1999). Global and fine information coded by single neurons in the temporal visual cortex. Nature, pages 869-873.
  62. Tarr, M. and Blthoff, H. (1998). Image-based object recognition in man, monkey and machine. Cognition, 67(1- 2):1-20.
  63. Taylor, G. and Hinton, G. (2009). Factored conditional Restricted Boltzmann Machine for modeling motion style. In ICML.
  64. Thirion, B., Duchesnay, E., Hubbard, E., Dubois, J., Poline, J., Lebihan, D., and Dehaene, S. (2006). Inverse retinotopy: inferring the visual content of images from brain activation patterns. Neuroimg., 33(4):1104- 1116.
  65. Thorpe, S., Fize, D., and Marlot, C. (1996). Speed of processing in the human visual system. Nature, 381:520- 522.
  66. Thulasiraman, K. and Swamy, M. (1992). Graphs: Theory and Algorithms. John Wiley & Sons, Inc., NY.
  67. Turk, M. and Pentland, A. (1991). Eigenfaces for recognition. Cognitive Neuroscience, 3(1):71-86.
  68. Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In CVPR, volume 1, pages I-511 - I-518.
  69. Wallis, G. and Rolls, E. (1996). A model of invariant object recognition in the visual system. Prog. Neurobiol., 51:167-194.
  70. Wertheimer, M. (1938). Laws of organization in perceptual forms. W. Ellis, W (Ed. & Trans.), London: Routledge & Kegan Paul(Original work published in 1923).
  71. Wilson, H. and Cowan, J. (1972). Excitatory and inhibitory interactions in localized populations of model neurons. Biophysical Journal, 12(1):1-24.
  72. Wu, J. and Rehg, J. (2011). Centrist: A visual descriptor for scene categorization. PAMI, 33(8):1489-1501.
  73. Zhang, J., Lazebnik, S., and Schmid, C. (2007). Local features and kernels for classification of texture and object categories: a comprehensive study. Computer Vision, 73:213-238.
Download


Paper Citation


in Harvard Style

Azim T. (2014). Computational Models of Object Recognition - Goal, Role and Success . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-003-1, pages 179-186. DOI: 10.5220/0004737601790186


in Bibtex Style

@conference{visapp14,
author={Tayyaba Azim},
title={Computational Models of Object Recognition - Goal, Role and Success},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={179-186},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004737601790186},
isbn={978-989-758-003-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)
TI - Computational Models of Object Recognition - Goal, Role and Success
SN - 978-989-758-003-1
AU - Azim T.
PY - 2014
SP - 179
EP - 186
DO - 10.5220/0004737601790186