based techniques has been evaluated showing that the
presented multi-scale approach outperforms state-of-
the-art visual attention models. The experiments con-
firmed that bottom-up recognition is more difficult
but it is also easier to apply to arbitrary objects and
more efficient than specialized detectors that need to
be trained and applied separatly in a sliding window
approach. These properties and the independence of
annotations for most parts is an important step to-
ward automated object recognizer training. It has also
been shown that promising recognition rates can be
obtained for some object categories on the VOC2011
database.
REFERENCES
Achanta, R., Hemami, S., Estrada, F., and Susstrunk, S.
(2009). Frequency-tuned salient region detection. In
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 1597–1604.
Alexe, B., Deselaers, T., and Ferrari, V. (2012). Measur-
ing the objectness of image windows. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
34(11):2189–2202.
Borji, A. and Itti, L. (2013). State-of-the-art in visual atten-
tion modeling. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 35(1):185–207.
Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman,
A. (2011). The devil is in the details: an evaluation of
recent feature encoding methods. In BMVC.
Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., and
Hu, S.-M. (2011). Global contrast based salient region
detection. In IEEE CVPR, pages 409–416.
Divvala, S. K., Hoiem, D., Hays, J. H., Efros, A. A., and
Hebert, M. (2009). An empirical study of context
in object detection. In Computer Vision and Pattern
Recognition, 2009. CVPR 2009. IEEE Conference on,
pages 1271–1278. IEEE.
Douze, M., J
´
egou, H., Sandhawalia, H., Amsaleg, L., and
Schmid, C. (2009). Evaluation of gist descriptors
for web-scale image search. In Proceedings of the
ACM International Conference on Image and Video
Retrieval, page 19. ACM.
Elazary, L. and Itti, L. (2008). Interesting objects are visu-
ally salient. Journal of Vision, 8(3):1–15.
Everingham, M., Van Gool, L., Williams, C. K. I.,
Winn, J., and Zisserman, A. (2011). The
PASCAL Visual Object Classes Challenge
2011 (VOC2011) Results. http://www.pascal-
network.org/challenges/VOC/voc2011/workshop/
index.html.
Felzenszwalb, P., Girshick, R., McAllester, D., and Ra-
manan, D. (2010). Object detection with discrimi-
natively trained part-based models. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
32(9):1627–1645.
Grzeszick, R., Rothacker, L., and Fink, G. A. (2013). Bag-
of-features representations using spatial visual vocab-
ularies for object classification. In IEEE Intl. Conf. on
Image Processing, Melbourne, Australia.
Haxhimusa, Y., Ion, A., and Kropatsch, W. G. (2006). Ir-
regular pyramid segmentations with stochastic graph
decimation strategies. In CIARP, pages 277–286.
Hou, X. and Zhang, L. (2007). Saliency detection: A spec-
tral residual approach. In IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), pages
1–8.
Itti, L., Koch, C., and Niebur, E. (1998). A model of
saliency-based visual attention for rapid scene anal-
ysis. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, 20(11):1254–1259.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
volume 2, pages 2169–2178.
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X.,
and Shum, H.-Y. (2011). Learning to detect a salient
object. IEEE Trans. on Pattern Analysis and Machine
Intelligence, 33(2):353–367.
Lloyd, S. (1982). Least squares quantization in PCM. In-
formation Theory, IEEE Transactions on, 28(2):129–
137.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. Int. Journal of Computer Vision,
60(2):91–110.
Nasse, F. and Fink, G. A. (2012). A bottom-up approach
for learning visual object detection models from unre-
liable sources. In Pattern Recognition: 34th DAGM-
Symposium Graz.
Oliva, A. (2005). Gist of the scene. Neurobiology of atten-
tion, 696:64.
Oliva, A., Torralba, A., et al. (2006). Building the gist of
a scene: The role of global image features in recogni-
tion. Progress in brain research, 155:23.
Rutishauser, U., Walther, D., Koch, C., and Perona, P.
(2004). Is bottom-up attention useful for object recog-
nition? In Computer Vision and Pattern Recogni-
tion, 2004. CVPR 2004. Proceedings of the 2004 IEEE
Computer Society Conference on, volume 2, pages II–
37–II–44 Vol.2.
Walther, D., Itti, L., Riesenhuber, M., Poggio, T., and
Koch, C. (2002). Attentional selection for object
recognition: A gentle way. In Proceedings of the
Second International Workshop on Biologically Mo-
tivated Computer Vision, BMCV ’02, pages 472–479,
London, UK, UK. Springer-Verlag.
Zhai, Y. and Shah, M. (2006). Visual attention detection in
video sequences using spatiotemporal cues. In Pro-
ceedings of the 14th annual ACM international con-
ference on Multimedia, MULTIMEDIA ’06, pages
815–824, New York, NY, USA. ACM.
TowardObjectRecognitionwithProto-objectsandProto-scenes
291