Top-Down Visual Attention with Complex Templates

Jan Tünnermann, Christian Born, Bärbel Mertsching


Visual attention can support autonomous robots in visual tasks by assigning resources to relevant portions of an image. In this biologically inspired concept, conspicuous elements of the image are typically determined with regard to different features such as color, intensity or orientation. The assessment of human visual attention suggests that these bottom-up processes are complemented – and in many cases overruled – by top-down influences that modulate the attentional focus with respect to the current task or a priori knowledge. In artificial attention, one branch of research investigates visual search for a given object within a scene by the use of top-down attention. Current models require extensive training for a specific target or are limited to very simple templates. Here we propose a multi-region template model that can direct the attentional focus with respect to complex target appearances without any training. The template can be adaptively adjusted to compensate gradual changes of the object’s appearance. Furthermore, the model is integrated with the framework of region-based attention and can be combined with bottom-up saliency mechanisms. Our experimental results show that the proposed method outperforms an approach that uses single-region templates and performs equally well as state-of-the-art feature fusion approaches that require extensive training.


  1. Aziz, Z., Knopf, M., and Mertsching, B. (2011). Knowledge-Driven Saliency: Attention to the Unseen. In ACIVS 6915, LNCS, pages 34 - 45.
  2. Aziz, Z. and Mertsching, B. (2008a). Fast and Robust Generation of Feature Maps for Region-Based Visual Attention. IEEE Transactions on Image Processing, 17, May 2008(5):633 - 644.
  3. Aziz, Z. and Mertsching, B. (2008b). Visual Search in Static and Dynamic Scenes Using Fine-Grain Top-Down Visual Attention. In ICVS 5008, LNCS, pages 3 - 12, Santorini, Greece.
  4. Backer, M., Tünnermann, J., and Mertsching, B. (2012). Parallel k-Means Image Segmentation Using Sort, Scan & Connected Components on a GPU. In FTMC-III, LNCS.
  5. Belardinelli, A., Pirri, F., and Carbone, A. (2009). Attention in cognitive systems. chapter Motion Saliency Maps from Spatiotemporal Filtering, pages 112-123. Springer, Berlin - Heidelberg.
  6. Blanz, V., Schölkopf, B., Bülthoff, H., Burges, C., Vapnik, V., and Vetter, T. (1996). Comparison of View-based Object Recognition Algorithms Using Realistic 3D Models. In von der Malsburg, C., von Seelen, W., Vorbrüggen, J., and Sendhoff, B., editors, Artificial Neural Networks, volume 1112 of LNCS, pages 251-256. Springer, Berlin - Heidelberg.
  7. Borji, A. and Itti, L. (2012). State-of-the-Art in Visual Attention Modeling. Accepted for: IEEE TPAMI.
  8. Hilkenmeier, F., Tünnermann, J., and Scharlau, I. (2009). Early Top-Down Influences in Control of Attention: Evidence from the Attentional Blink. In KI 2009: Advances in Artificial Intelligence. Proceeding of the 32nd Annual Conference on Artificial Intelligence.
  9. Hou, X. and Zhang, L. (2007). Saliency detection: A spectral residual approach. In IEEE CVPR, pages 1-8.
  10. Itti, L. and Koch, C. (2001). Feature Combination Strategies for Saliency-Based Visual Attention Systems. Journal of Electronic Imaging, 10(1):161-169.
  11. Itti, L., Koch, C., and Niebur, E. (1998). A Model of Saliency-Based Visual Attention for Rapid Scene Analysis. IEEE TPAMI, 20(11):1254-1259.
  12. Jian Li, Martin Levine, X. A. and He, H. (2011). Saliency Detection Based on Frequency and Spatial Domain Analyses. In BMVC, pages 86.1-86.11. BMVA Press.
  13. Kalal, Z., Matas, J., and Mikolajczyk, K. (2009). Online Learning of Robust Object Detectors During Unstable Tracking. On-line Learning for Computer Vision Workshop.
  14. Koch, C. and Ullman, S. (1985). Shifts in Selective Attention: Towards the Underlying Neural Circuitry. Human Neurobiology, 4:219-227.
  15. Kotthäuser, T. and Mertsching, B. (2010). Validating Vision and Robotic Algorithms for Dynamic Real World Environments. In Ando, N., Balakirsky, S., Hemker, T., Reggiani, M., and Stryk, O., editors, Simulation, Modeling, and Programming for Autonomous Robots, volume 6472 of LNCS, pages 97-108. Springer, Berlin - Heidelberg.
  16. Kouchaki, Z. and Nasrabadi, A. M. (2012). A Nonlinear Feature Fusion by Variadic Neural Network in Saliencybased Visual Attention. VISAPP, pages 457-461.
  17. Li, W., Piëch, V., and Gilbert, C. D. (2004). Perceptual Learning and Top-Down Influences in Primary Visual Cortex. Nature Neuroscience, 7(6):651-657.
  18. Navalpakkam, V. and Itti, L. (2006). An Integrated Model of Top-Down and Bottom-Up Attention for Optimal Object Detection. In IEEE CVPR, pages 2049-2056, New York, NY.
  19. Oliva, A. and Torralba, A. (2006). Building the Gist of a Scene: The Role of Global Image Features in Recognition. In Progress in Brain Research, page 2006.
  20. Torralba, A., Oliva, A., Castelhano, M. S., and Henderson, J. M. (2006). Contextual Guidance of Eye Movements and Attention in Real-world Scenes: The Role of Global Features in Object Search. Psychological Review, 113(4):766-786.
  21. Treisman, A. M. and Gelade, G. (1980). A featureintegration theory of attention. Cognitive psychology, 12(1):97-136.
  22. Tünnermann, J. and Mertsching, B. (2012). Continuous Region-Based Processing of Spatiotemporal Saliency. In VISAPP, pages 230 - 239.
  23. Wischnewski, M., Belardinelli, A., Schneider, W. X., and Steil, J. J. (2010). Where to Look Next? Combining Static and Dynamic Proto-objects in a TVA-based Model of Visual Attention. Cognitive Computation, pages 326- 343.
  24. Wolfe, J. M. and Horowitz, T. S. (2004). What Attributes Guide the Deployment of Visual Attention and How Do They Do It? Nature Reviews Neuroscience, 5(6):495- 501.
  25. Yarbus, A. L. (1967). Eye Movements and Vision. Plenum., New York, NY.

Paper Citation

in Harvard Style

Tünnermann J., Born C. and Mertsching B. (2013). Top-Down Visual Attention with Complex Templates . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 370-377. DOI: 10.5220/0004302403700377

in Bibtex Style

author={Jan Tünnermann and Christian Born and Bärbel Mertsching},
title={Top-Down Visual Attention with Complex Templates},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},

in EndNote Style

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Top-Down Visual Attention with Complex Templates
SN - 978-989-8565-47-1
AU - Tünnermann J.
AU - Born C.
AU - Mertsching B.
PY - 2013
SP - 370
EP - 377
DO - 10.5220/0004302403700377