ing a particular object index is greater than a prede-
fined threshold (10 in this implementation) we declare
the object as being present at the corresponding loca-
tion. However, if there are different bins voting for the
same object at different locations in the scene due to
possible false feature matching, the location with the
maximum number of votes is marked as the expected
location. In the end, the recognition module returns a
set of locations corresponding to those objects in the
model whose voting support was sufficient.
3 ATTENTION MECHANISM
Our attention mechanism controls what the robot will
look at, for how long it will keep looking at it, and
where it should avoid looking. We embody curios-
ity in the attention mechanism by introducing the fol-
lowing ways of guiding attention to where learning
progress is likely.
3.1 Bottom-up Saliency at Interest
Points
We have adapted a bottom-up saliency model devel-
oped by Itti et al. (Itti and Koch, 2001). In this
model the conspicuity of each image location in terms
of its color, intensity, orientation, motion, etc. is en-
coded in a so-called saliency map. We make use of
stereo information to select the most salient point in
the scene. Images from both eyes are processed to
obtain left and right saliency maps. Since objects
are represented as features extracted at interest points,
our attention mechanism only considers points in the
saliency map that are associated with a pair of in-
terest points matched between left and right image
(all other points are neglected). In this way we re-
strict attention to locations of potential objects that
the system could learn about. The saliency values
for the matched interest points are computed using a
2-dimensional gaussian centered on them, with σ =
1.5 and a cutoff value of 0.05. This has the effect of
bringing out clusters of high salience more than just
isolated pixels of high salience.
When there are no other variations in the visual
characteristics of the scene it is very likely that the
attention mechanism continues to select the same lo-
cation as the most salient point. To avoid this we tem-
porarily inhibit the saliency map around the current
winner location by subtracting a Gaussian kernel at
the current winner location. This allows the system
to shift attention to the next most salient location. To
avoid constant switching between the two most salient
locations, we also use a top-down inhibition of al-
ready learned objects below.
3.2 Attention based on Learning
Progress
It has been argued that infants’ interest in a stimu-
lus is related to their learning progress, i.e., the im-
provement of an internal model of the stimulus (Wang
et al., 2011). We mimic this idea in the following way.
When the robot looks at an object, it detects whether
the object is familiar or not. If the object is new it cre-
ates a new object model making new associations in
the shared feature dictionary. If the object is known,
the model is updated by acquiring new features from
the object. The attention remains focused on the ob-
ject until the learning progress becomes too small. As
a side effect, the robot continues learning about an ob-
ject when a human interferes by rotating or moving it,
exposing different views with unknown features.
3.3 Top-down Rejection of Familiar
Objects
The third mechanism to focus attention on locations
where learning progress is likely makes use of the sys-
tem’s increasing ability to recognize familiar objects.
A purely saliency-based attention mechanism may se-
lect the same object again and again during explo-
ration, even if the scope for further learning progress
has become very small. Therefore, once there are
no more new features found on certain objects, our
system inhibits their locations in the saliency map
whereever they are recognized (Fig. 5a). To this end,
the models of these objects are used to detect them in
every frame using the recognition module. The inter-
est points on the saliency map that are in the vicinity
of the object detections are removed from being con-
sidered for the winner location.
3.4 Top-down Rejection of Recently
Visited Locations
We have incorporated an inhibition-of-return mech-
anism that prevents the robot from looking back to
locations that it has recently visited. To this end, the
absolute 3D coordinates of the visited locations are
saved in the memory and they are mapped onto the
pixel coordinates on images from the cameras in their
current positions to know the locations for inhibition.
In our experiments, a list of the 5 most recently visited
locations is maintained and close-by interest points
are inhibited for the next gaze shift (Fig. 5b).
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
172