Authors:
Fabian Nasse
1
;
Rene Grzeszick
1
and
Gernot A. Fink
2
Affiliations:
1
TU Dortmund, Germany
;
2
TU Dortmund University, Germany
Keyword(s):
Object-recognition, Visual Attention, Bottom-up Detection, Proto-objects, Proto-scenes.
Related
Ontology
Subjects/Areas/Topics:
Computer Vision, Visualization and Computer Graphics
;
Image and Video Analysis
;
Visual Attention and Image Saliency
Abstract:
In this paper a bottom-up approach for detecting and recognizing objects in complex scenes is presented. In
contrast to top-down methods, no prior knowledge about the objects is required beforehand. Instead, two
different views on the data are computed: First, a GIST descriptor is used for clustering scenes with a similar
global appearance which produces a set of Proto-Scenes. Second, a visual attention model that is based on
hiearchical multi-scale segmentation and feature integration is proposed. Regions of Interest that are likely to
contain an arbitrary object, a Proto-Object, are determined. These Proto-Object regions are then represented by
a Bag-of-Features using Spatial Visual Words. The bottom-up approach makes the detection and recognition
tasks more challenging but also more efficient and easier to apply to an arbitrary set of objects. This is an
important step toward analyzing complex scenes in an unsupervised manner. The bottom-up knowledge is
combined with an informed s
ystem that associates Proto-Scenes with objects that may occur in them and an
object classifier is trained for recognizing the Proto-Objects. In the experiments on the VOC2011 database the
proposed multi-scale visual attention model is compared with current state-of-the-art models for Proto-Object
detection. Additionally, the the Proto-Objects are classified with respect to the VOC object set.
(More)