Hierarchical Bayesian Modelling of Visual Attention
Jinhua Xu
2014
Abstract
The brain employs interacting bottom-up and top-down processes to speed up searching and recognizing visual targets relevant to specific behavioral tasks. In this paper, we proposed a Bayesian model of visual attention that optimally integrates top-down, goal-driven attention and bottom-up, stimulus-driven visual saliency. In this approach, we formulated a multi-scale hierarchical model of objects in natural contexts, where the computing nodes at the higher levels have lower resolutions and larger sizes than the nodes at the lower levels, and provide local contexts for the nodes at the lower levels. The conditional probability of a visual variable given its context is calculated in an efficient way. The model entails several existing models of visual attention as its special cases. We tested this model as a predictor of human fixations in free-viewing and object searching tasks in natural scenes and found that the model performed very well.
References
- Bell, A. J. and Sejnowski, T. J. (1997). The ā€¯independent componentsā€¯ of natural scenes are edge filters. Vision Res, 37:3327-38.
- Borji, A. and Itti, L. (2013). State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell, 35:185-207.
- Bruce, N. D. and Tsotsos, J. K. (2009). Saliency, attention, and visual search: an information theoretic approach. J Vision, 9:1-24.
- Ehinger, K. A., Hidalgo-Sotelo, B., Torralba, A., and Oliva, A. (2009). Modeling search for people in 900 scenes: A combined source model of eye guidance. Visual Cognition, 17:945-978.
- Einhauser, W., Spain, M., and Perona, P. (2008). Objects predict fixations better than early saliency. J Vis, 8:1- 26.
- Elazary, L. and Itti, L. (2010). A bayesian model for efficient visual search and recognition. Vision Res, 50:1338-52.
- Frintrop, S., Rome, E., and Christensen, H. I. (2010). Computational visual attention systems and their cognitive foundations: A survey. ACM Transactions on Applied Perception, 7:1-46.
- Gao, D., Han, S., and Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Trans Pattern Anal Mach Intell, 31:989-1005.
- Gao, D. and Vasconcelos, N. (2007). Bottom-up saliency is a discriminant process. In ICCV. IEEE.
- Gao, D. and Vasconcelos, N. (2009). Decision-theoretic saliency: computational principles, biological plausibility, and implications for neurophysiology and psychophysics. Neural Comput, 21:239-71.
- Gattass, R., Sousa, A. P., and Gross, C. G. (1988). Visuotopic organization and extent of v3 and v4 of the macaque. Journal of neuroscience, 8:1831-45.
- Hyvarinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans Neural Netw, 10:626-34.
- Itti, L. and Baldi, P. (2009). Bayesian surprise attracts human attention. Vision Res, 49:1295-306.
- Itti, L., Koch, C., and Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell, 20:1254- 1259.
- Kanan, C., Tong, M. H., Zhang, L., and Cottrell, G. W. (2009). Sun: Top-down saliency using natural statistics. Visual Cognition, 17:979-1003.
- Lee, T. S. and Mumford, D. (2003). Hierarchical bayesian inference in the visual cortex. Journal of the Optical Society of America A, Optics, image science, and vision, 20:1434-48.
- Lee, T. S., Yang, C. F., Romero, R. D., and Mumford, D. (2002). Neural activity in early visual cortex reflects behavioral experience and higher-order perceptual saliency. Nature neuroscience, 5:589-97.
- Navalpakkam, V. and Itti, L. (2005). Modeling the influence of task on attention. Vision Res, 45:205-31.
- Navalpakkam, V. and Itti, L. (2007). Search goal tunes visual features optimally. Neuron, 53:605-17.
- Olshausen, B. A. and Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607-9.
- Peters, R. J. and Itti, L. (2007). Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention. In CVPR'06. IEEE.
- Rao, R. P., Zelinsky, G. J., Hayhoe, M. M., and Ballard, D. H. (2002). Eye movements in iconic visual search. Vision Res, 42:1447-63.
- Russell, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. (2008). Labelme: A database and web-based tool for image annotation. International Journal of Computer Vision, 77:157-173.
- Serre, T., Oliva, A., and Poggio, T. (2007). A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci U S A, 104:6424-9.
- Tanaka, K. (1996). Inferotemporal cortex and object vision. Annual review of neuroscience, 19:109-39.
- Tatler, B. W., Baddeley, R. J., and Gilchrist, I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision Res, 45:643-59.
- Toet, A. (2011). Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans Pattern Anal Mach Intell, 33:2131- 46.
- Torralba, A., Oliva, A., Castelhano, M. S., and Henderson, J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev, 113:766-86.
- Triesch, J., Ballard, D. H., Hayhoe, M. M., and Sullivan, B. T. (2003). What you see is what you need. J Vis, 3:86-94.
- Xu, J., Yang, Z., and Tsien, J. Z. (2010). Emergence of visual saliency from natural scenes via contextmediated probability distributions coding. PLoS ONE, 5.
- Zelinsky, G. J., Zhang, W., Yu, B., Chen, X., and Samaras, D. (2006). The role of top-down and bottom-up processes in guiding eye movements during visual search. In NIPS'06. Cambridge, MA: MIT Press.
- Zhang, L., Tong, H., Marks, T., Shan, H., and Cottrell, G. W. (2008). Sun: A bayesian framework for saliency using natural statistics. J Vis, 8:1-20.
Paper Citation
in Harvard Style
Xu J. (2014). Hierarchical Bayesian Modelling of Visual Attention . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-003-1, pages 347-358. DOI: 10.5220/0004731303470358
in Bibtex Style
@conference{visapp14,
author={Jinhua Xu},
title={Hierarchical Bayesian Modelling of Visual Attention},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={347-358},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004731303470358},
isbn={978-989-758-003-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2014)
TI - Hierarchical Bayesian Modelling of Visual Attention
SN - 978-989-758-003-1
AU - Xu J.
PY - 2014
SP - 347
EP - 358
DO - 10.5220/0004731303470358