then the robot should learn a set of behavior weights
that allows it to traverse paths of that width and larger,
while preventing it from entering narrower corridors.
Likewise, for the setup in which avoid and attract are
the behaviors the robot has available, there are two
possible paths to the goal: one which goes through
a narrow corridor and one which bypasses the corri-
dor altogether. In this case, it is hoped that if the user
drives the robot along the former path, the learning
algorithm will derive a weight for the attract vector
strong enough to compel the robot to traverse the cor-
ridor. Conversely, if the user bypasses the corridor, it
is hoped that the algorithm will derive a weight for the
attract vector that is too weak to overcome the contri-
bution from the avoid vector when faced with such a
tight space. The demonstration stops in both sets of
experiments when the user reaches the goal object.
The learning algorithm then analyzes the infor-
mation recorded in each of the demonstration runs.
For each timestep recorded, the algorithm derives a
weight based on the component vectors, the turn rate
and the speed of the robot for that timestep. The al-
gorithm produces a set of behavior weights, as de-
scribed in Section 4. While in our current implemen-
tation the learning is performed off-line, at the end of
the demonstration, the processing could also be per-
formed on-line, after each executed step.
To evaluate the performance of the learning algo-
rithm we place the robot at various locations in the en-
vironment and equip it with an autonomous controller
which uses the derived weights. If the robot performs
the same strategy as demonstrated by the user (e.g.,
does not navigate corridors or tight spaces narrower
than those through which the user drove it, but tra-
verses any wider spaces), the experiment is consid-
ered a success.
5.3 Results
To test the learning algorithm on the first scenario
that involves the avoid/wander behavior pair, we per-
formed three separate experiments, one for each path
indicated in Figure 3 (a). Furthermore, each of these
experiments was repeated three times, resulting in the
nine values in the first scenario portion of Table 1.
First, the user navigated the robot through the nar-
rowest path towards the goal. The result was a rel-
atively large weight for the wander behavior, as com-
pared with weights derived for wander in the next two
demonstrations (see Table 1). We then activated the
robot’s controller using the learned weights in three
different runs, starting from various initial positions:
one inside the narrow corridor, one inside the median-
width corridor, and one inside the wide corridor. The
controller allowed the robot to easily traverse the nar-
row corridor, as well as wider areas, since the weight
of wander was able to overcome the response of the
avoid behavior.
In a second demonstration, the user guided the ro-
bot through the middle-width path, which resulted in
a significantly smaller weight for the wander behav-
ior (see Table 1). We tested the resulting controller
in three different runs, starting the robot in each of
the thee corridors (narrow, median-width and large).
When placing the robot on the narrowest path we
found that the weight of the wander behavior is no
longer strong enough to counter the effect of obstacle
avoidance, forcing the robot to reverse in an attempt to
escape from the constricting space. However, the ro-
bot was easily able to traverse the median-width cor-
ridor, as well as the largest one.
Finally, the user took the widest path for the third
demonstration, which resulted in a significantly lower
weight for the wander behavior than in the previous
two runs (see Table 1). In the three experiments we
performed with the learned controller, the robot was
not able to traverse the narrow and median-width cor-
ridors, even when placed there, but was able to tra-
verse the widest corridor with ease.
Two subsequent repetitions of these experiments
(both learning and validation) led to similar re-
sults (with slight differences due to variability in
the user’s demonstration), leading to the conclusion
that the learning algorithm correctly derives the rela-
tive importance of the two component vectors in the
avoid/wander scenario and that it accurately captured
the strategy of the demonstrator.
Table 1: Behavior weights learned through demonstration
(Avoid weights kept constant at 1)
Wander vs. Avoid weight
First scenario
Exp. 1 Exp. 2 Exp. 3
Narrow corridor 7.8 14.4 8.4
Medium corridor 3.8 3.2 3.3
Wide corridor 0.4 0.6 0.4
Attract vs. Avoid weight
Second scenario
Exp. 1 Exp. 2 Exp. 3
Traverse corridor 195.5 215.2 195.6
Avoid corridor 124.0 131.9 118.7
The experiments for the second scenario, for which
the avoid/attract behavior pair was available to the
robot, followed similar lines and achieved similar re-
sults. In a first demonstration, the user drove the robot
through the narrow corridor and directly to the goal.
This resulted in a weight for the attract behavior that
allowed the robot, in the first validation run, to tra-
verse the corridor, even in the presence of obstacle
avoidance. In the second validation run, the robot was
again able to reach the goal when placed at an initial
position where it could see the goal, but was not sep-
arated from it by the narrow corridor.
ROBOT LEARNING BY DEMONSTRATION USING FORWARD MODELS OF SCHEMA-BASED BEHAVIORS
267