the localization given the robot’s motions. Alterna-
tives include estimates in position through other sen-
sor modalities like external microphones, which cap-
ture the robot’s intrinsic noise (Allen et al., 2012).
Unfortunately, the location of stimuli and the robot
in relation to the sensors may also compromise the
results. Furthermore, the orientation component can-
not be estimated in this way. In general, exteroceptive
solutions neglect the corporal metaphor, impose the
condition that the environment must be adapted to the
problem, and are very sensitive to modeling impreci-
sions.
When considering on-board solutions, the limited
control over the head’s direction and its effect over
the visual output, has complicated the task (Michel
et al., 2005). To treat this problem, the compensation
for the head’s motion by taking advantage of the kine-
matic and geometric models of the robot has been at-
tempted; such as, a virtual camera has been defined to
cancel the sway motion in the visual features for con-
tinuous visual servoing (Dune et al., 2010). Though,
considerable delays may be involved in the vision pro-
cessing; due to digital image treatment and video data
transference from the on-board camera to the com-
puter system (Moughlbay et al., 2013). These delays
can restrict the applications of real-time visual servo-
ing techniques in closed-loop. Furthermore, physio-
logical evidence has also reported considerable delays
in the human visuo-motor loop (Miall et al., 1993).
The feedback is estimated to take around 130 ms for
ocular-motor control and 110-150 ms for propriocep-
tive control. According to these figures, the perfor-
mance observed in natural beings may be better ex-
plained by the organization and the efficiency in the
management of the available resources, rather than
by the computational power. In addition, continu-
ous visual control during walking may not be nec-
essary since depending on the walking stage, images
have greater or less relevance for the localization (the
head’s motion may produce blurred images at certain
points). So considerable processing overhead may be
added with little benefit for localization.
The task representation has also been a topic of in-
terest. The displacement to be accomplished has been
referenced within a global map; that the agent may
possess, update, or build while navigating. For exam-
ple, in a work developed by (Hornung et al., 2010),
the problem of indoor localization is tackled by adapt-
ing a range sensor to the robot’s head. The posture of
the robot is estimated within a known volumetric map
of the environment; such as, the on-board measure-
ments parametrize a probabilistic search routine. In a
work developed by (Robert Cupec, 2005), a global lo-
calization policy is combined with local references to
enhance the accuracy when stepping over small obsta-
cles. The strategy is based on an interesting method
for directing the gaze by maximizing the visual in-
formation; but evidences the limitations of the global
localization approach where the accuracy is greatly
affected and strongly depends on the quality of mod-
eling (including parameters estimation), and the noise
in the measurements. For both of these works, the
localization-and-locomotion task has been modeled
as a control problem, with the body playing the role of
a mere tool that has to be commanded appropriately
(Hoffmann and Pfeifer, 2012).
The discussion has exposed at this point some
important aspects about localization in humanoid
robotics. It has been assessed the reliability of the
on-board sensory to effectively accomplish the task,
given the noise introduced by motion. Furthermore,
the role of the agent in relation to the environment
has been investigated; in particular, the extent to
which the environment must be known or adapted to
the agent for the attainment of the localization task.
Lastly, the convenience of using a global reference
policy has been contrasted to locally referencing stim-
uli, in relation to the precision obtained for localiza-
tion. This research starts from the hypothesis that
on-board localization can be achieved by relying on
robust object segmentation, with minimal knowledge
about the environment, and defining a sensory ego-
centric reference system. In the following, these as-
pects will be discussed.
3 IMAGE SEGMENTATION
Image segmentation and object tracking are hard pro-
cesses to achieve. In the literature, a huge number
of techniques are available, where each one imposes
certain constraints. An in-deep treatment of the topic
cannot be accomplished here; thus, some of the ex-
plored proposals that showed good results are going
to be briefly discussed.
The first approach considered was the classical k-
means algorithm (MacQueen, 1967), which is a con-
venient technique to obtain clusters. The method is
not very efficient for real-time applications given its
high computational complexity ς = O(n
dk+1
log(n)),
for d dimension feature vectors, k clusters and n el-
ements (pixels in the case of images). Also, k is re-
quired which significantly constraints the characteris-
tics of the images to be treated. The expectation max-
imization (EM) algorithm (Dempster et al., 1977) is
more efficient and general, in the sense that the clus-
ters are represented by probability distributions and
not just by the means. Unfortunately, it also requires
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
166