known environments. Therefore, an online detec-
tion method that allows automatic segmentation of
unknown objects is indispensable. Most of the state
of the art methods require user defined object model,
which is unusable in our case. The robot has the
task to navigate around the unknown objects to in-
spect them from different viewpoints. For this on-
line segmentation task, existing background subtrac-
tion methods (Zivkovic, 2004) will fail because of a
constant change of the background. Motion based
online segmentation (Mooser et al., 2007) is not an
option since the objects in the environment are static
without any motion information. Thus a model based
tracker which can update online is needed. How-
ever histogram based online segmentation such as
Camshift (Bradski, 1998) can not handle textured ob-
jects. Therefore we require an object-driven segmen-
tation method which is able to work in case of com-
plex scenes and objects.
In this paper, we present a novel system for ro-
bust online segmentation of unknown objects which
can overcome all above mentioned difficulties. The
main contributions are as follows. Firstly, we im-
plement a vision system that can autonomously per-
ceive objects in unknown environments without any
prior knowledge. Secondly, we propose a robust on-
line segmentation method by utilizing different ob-
ject detection methods in order to achieve a good
performance in spite of viewpoint changes, illumi-
nation changes, background clutter as well as occlu-
sion. Our method also provides refined information
about the objects such as shapes and contours instead
of only locations. Thirdly, in our setup the camera
moves around the static objects, which is in contrast
to most active vision applications where static cam-
eras track or segment the moving objects. Further-
more, we tested the system on foveated vision setup
and achieved very promising results.
This paper is organized as follows. In Sec. 2 we
provide a general outline of the proposed system; in
Sec. 3 we explain the algorithms in detail; in Sec. 4
we show system evaluation and results analysis.
2 GENERAL FRAMEWORK
In this section we present a general framework of the
system, depicted in Figure 1. We propose several
steps: detection of a dominant unknown object, initial
model generation, tracking to update the object model
and detailed object segmentation. We assume that no
initial knowledge on a scene or objects is given.
In the initial step it is necessary to detect the ap-
proximate positions of unknown objects. For ini-
tial segmentation, we propose a bottom-up segmen-
tation based on the salient information in the static
scene. After the saliency map of the scene is cal-
culated, saliency points in the map are detected and
clustered into salient regions, where every region rep-
resents a potential unknown object. A cluster with the
most salient points is assumed to be the most dom-
inant object in the scene and its initial model is ex-
tracted to be used for later segmentation. Details can
be found in Sec. 3.1 A camera is then maneuvered
around the dominant object to explore it from differ-
ent viewpoints. In each frame, the dominant object
is tracked by motion based tracker, and the model
of the object is rebuilt and constantly updated using
Random Forests based classification. By combining
the detection results of the motion tracking and the
model tracking the location of the object in the new
frame is derived. More detailed information is given
in Sec. 3.2.1 In the final step, for every viewpoint
and updated object model we do refined object seg-
mentation. The Gaussian Mixture Models(GMMs) is
used to create the object model and the background
model. Finally the graph cuts is used to obtain the op-
timal segmentation as is described in Sec. 3.2.2. As
a result, detailed contour information of the dominant
object is extracted.
3 APPROACH AND
IMPLEMENTATION
3.1 Salient Object Detection
In order to be able to learn novel objects in un-
structured environments, an initial step is to correctly
segment the objects without any prior knowledge
about the objects or their background. In our pre-
vious research (Rudinac and Jonker, 2010), we pro-
posed a method for fast object segmentation based
on the salient information in the scene. In the orig-
inal method (Hou and Zhang, 2007), saliency was de-
tected using a spectral residual approach on three dif-
ferent color channels, red-green, yellow-blue, and the
illumination channel. The saliency map was further
calculated as the inverse Fourier transform of each
spectral residual, and the results were combined to
obtain a more robust saliency map. The bright spots
in the saliency map represent points of interest. In or-
der to detect those peaks, we applied the MSER blob
detector (Matas et al., 2004) directly on the saliency
map. Once the interesting points were detected, close
points were clustered together using Parzen window
estimation, leading to the segmentation of objects in
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
366