ROBOT LEARNING BY DEMONSTRATION USING FORWARD

MODELS OF SCHEMA-BASED BEHAVIORS

Adam Olenderski, Monica Nicolescu, Sushil Louis

Dept. of Computer Science and Engineering

University of Nevada

Reno, NV, 89557

Keywords:

Learning by demonstration, human-robot interaction, mobile robots.

Abstract:

A signiﬁcant challenge in designing robot systems that learn from a teacher’s demonstration is the ability to

map the perceived behavior of the trainer to an existing set of primitive behaviors. A main difﬁculty is that

the observed actions may constitute a combination of individual behaviors’ outcomes, which would require

a decomposition of the observation onto multiple primitive behaviors. This paper presents an approach to

robot learning by demonstration that uses a potential-ﬁeld behavioral representation to learn tasks composed

by superposition of behaviors. The method allows a robot to infer essential aspects of the demonstrated tasks,

which could not be captured if combinations of behaviors would not have been considered. We validate our

approach in a simulated environment with a Pioneer 3DX mobile robot.

1 INTRODUCTION

Learning from demonstration is a natural method for

augmenting a robot’s skill repertoire. The main difﬁ-

culty of this approach is the interpretation of observa-

tions gathered from the instruction, as the robot has to

process the data coming from its sensors, then trans-

late it into appropriate skills or tasks. If the robot al-

ready has a basic set of capabilities, interpreting the

demonstration becomes a problem of creating a map-

ping between the perceived action of the teacher to a

set of existing primitives (Schaal, 1999).

The behavior-based approach is particularly well

suited for autonomous robot control due to its mod-

ularity and real-time response properties. While

behavior-based control emphasizes the concurrent

use of behaviors, existing learning by demonstration

strategies have mostly focused on learning unique

mappings from observations to a single robot behav-

ior. This approach relies on the assumption that there

exists a unique corresponding behavior underlying

any one of the demonstrator’s actions. However, this

is a strong assumption which does not hold in a signif-

icant number of situations. Biological evidence, such

as the schema theory (Arbib, 1992), suggests that mo-

tor behavior is typically expressed in terms of con-

current control of multiple different activities. This

view is also similar to the concept of basis behaviors

(Matari

c, 1997), in which a complete set of underly-

ing primitives is capable of generating the entire mo-

tor repertoire of a robot.

Based on these considerations, it becomes of sig-

niﬁcant interest to address the problem of learning a

decomposition of a demonstrator’s actions onto possi-

bly multiple primitives from a robot’s repertoire. Our

hypothesis is that such a strategy will enable robots

to learn aspects of the task that could not otherwise

be captured by a sequential learning approach. In this

paper, we assume that the robot is equipped with a set

of basic capabilities, which will constitute the foun-

dation for learning. The learning method we propose

relies on a schema-based representation of behaviors

(Arkin, 1987), which provides a uniform output in

the form of vectors generated using a potential ﬁelds

approach. This representation allows for cooperative

behavior coordination, i.e., fusion of commands from

multiple behaviors, which in our case is performed

through vector addition.

In this paper we present a method that uses for-

ward models to learn tasks which constitute combi-

nations of multiple (in our case two) concurrent prim-

itive behaviors. While, in general, more than two be-

haviors can contribute to the overall task, the results

we present in this paper demonstrate that signiﬁcant

aspects of the task can be learned even with only two

concurrent behaviors. The results we present demon-

263

Olenderski A., Nicolescu M. and Louis S. (2005).

ROBOT LEARNING BY DEMONSTRATION USING FORWARD MODELS OF SCHEMA-BASED BEHAVIORS.

In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Robotics and Automation, pages 263-269

DOI: 10.5220/0001186202630269

 SciTePress

strate the potential of learning approaches for coop-

erative behavior coordination and provide a strong

support for future exploration of these methodolo-

gies. The learning algorithm we present in this pa-

per is based on the idea of ﬁnding the contribution,

expressed as a weight, of each of the primitive behav-

iors to the demonstrated task. These weights incorpo-

rate essential information and can capture subtle dif-

ferences between tasks, which, even if achieving the

same goals, can lead to very different ways of task

execution.

The reminder of the paper is structured as follows:

Section 2 describes previous approaches to robot task

learning and the use of forward models for mapping

observed actions onto a robot’s skill repertoire. Sec-

tion 3 presents our schema-based behavior represen-

tation and Section 4 describes our learning approach.

We present our experimental results in Section 5 and

conclude with a discussion (Section 6) and a summary

of the proposed approach (Section 7).

2 RELATED WORK

The work we present in this paper falls in the category

of learning by experienced demonstrations. This im-

plies that the robot actively participates in the demon-

stration provided by the teacher, and experiences the

task through its own sensors. This is an essential

characteristic of our approach, and is what provides

the robot the data necessary for learning. The advan-

tage of using such techniques is that the robot is freed

from interpreting and relating the actions of a differ-

ent body to its own. The acquired sensory information

is in terms of the robot’s own structure and sensory-

motor skills.

In the mobile robot domain, successful approaches

that rely on this methodology have demonstrated

learning of reactive policies (Hayes and Demiris,

1994), trajectories (Gaussier et al., 1998), or high-

level representations of sequential tasks (Nicolescu

and Matari

c, 2003). These approaches employ a

teacher following strategy, in which the robot learner

follows a human or a robot teacher. Our work is sim-

ilar to that of (Aleotti et al., 2004), who perform the

demonstration in a simulated, virtual environment.

A signiﬁcant challenge of all robotic systems that

learn from a teacher’s demonstration is the ability

to map the perceived behavior of the trainer to their

own behavior repertoire. A successful approach that

has been previously employed for matching observa-

tions to robot behaviors is based on forward models

(Schaal, 1997; Wolpert et al., 1998), in which mul-

tiple behavior models compete for prediction on the

teacher’s behavior (Wolpert and Kawato, 1998; Hayes

and Demiris, 1994). The behavior with the most accu-

rate prediction is the one said to match the observed

action. In this paper we present a method that uses

predictive models of the robot’s behaviors to learn

tasks which consist of combinations of concurrently

running behaviors.

3 BEHAVIOR REPRESENTATION

Behavior-Based Control (BBC) (Matari

c, 1997;

Arkin, 1998) has become one of the most popular ap-

proaches to embedded system control both in research

and in practical applications.

In this paper we use a schema-based representa-

tion of behaviors, similar to that described in (Arkin,

1987). This choice is essential for the purpose of our

work, since it provides a continuous encoding of be-

havioral responses and a uniform output in the form

of vectors generated using a potential ﬁelds approach.

In our system, a controller consists of a set of

concurrently running behaviors. Thus, for a given

task, each behavior brings its own contribution to

the overall motor command. These contributions are

weighted such that, for example, an obstacle avoid-

ance behavior could have a higher impact than reach-

ing a target, if the obstacles in the ﬁeld are signiﬁ-

cantly dangerous to the robot. Alternatively, in a time

constrained task, the robot could give a higher contri-

bution to getting to the destination than to obstacles

along the way. These weights affect the magnitude

of the individual vectors coming from each behavior,

thus generating different modalities of execution for

the task.

4 LEARNING APPROACH

The main idea of the learning algorithm we propose is

to infer, from a teacher provided demonstration, what

is the contribution (expressed as a weight) of each of

the robot’s behaviors to the presented task. As men-

tioned earlier, these weights modulate the magnitude

of vectors coming from the individual robot behav-

iors, thus inﬂuencing the resulting (fused) command

and consequently the way the robot interacts with the

world. However, choosing these weights is not a triv-

ial problem. Numerous experiments might be needed

to determine a set of weights that allow the robot to

go through tight corridors and will keep it from col-

liding with obstacles as well. Therefore, in order to

save time and other resources (such as robot power),

it becomes of signiﬁcant interest to derive the appro-

priate weights for the behaviors by demonstrating the

desired navigation style to the robot.

For pairs of behaviors, this is accomplished through

a learning algorithm that is based on the geometric

ICINCO 2005 - ROBOTICS AND AUTOMATION

264

properties of vectors. During the demonstration, a

human uses a joystick to guide the robot through the

task while the robot’s behaviors continuously provide

predictions on what their outputs would be (in the

form of a vector) for the current sensory readings.

However, instead of being translated into motor com-

mands, these predictions are recorded along with the

turning rate of the robot at that moment of time. Thus,

for each time step we are provided with a pair of vec-

tors v

and v

of known directions and magnitudes

(from the two behaviors) and a vector representing the

combination of the two vectors v, of known orienta-

tion, but whose magnitude is unknown (from the user

commands). However, it is known that the resulting

vector represents the combination of the two vectors

and v

, modiﬁed by some unknown weights w

and w

(Figure 1). The goal of the algorithm is to

infer what are these weights, more precisely to learn

the ratio w = w

between the two. Intuitively,

we are interested in computing the relative weight be-

tween the contribution of the two behaviors, which

could produce the vector v, whose orientation is pro-

vided by the teacher’s demonstration.

Figure 1: Visual representation of vectors v

, v

, (a) rays r

and r

and (b) vectors v and v

′

The learning algorithm analyzes each of the records

as follows: of the two vectors v

and v

being an-

alyzed, one vector’s weight will be kept constant at

1 (say v

) while the other (say v

) will be assigned

a weight relative to the ﬁrst. For every record, the

turn rate of the robot is interpreted as the angle of a

ray r, which is the support of the resulting vector v,

obtained from the teacher’s demonstration. The ori-

gin of another ray r

whose orientation is the same

as that of v

is placed at the terminal endpoint of v

(Figure 1(a)). The intersection point of rays r and r

can be interpreted as the endpoint of two vectors v

′

and v, where the initial endpoint and angle of v

′

are

the same as those of r

, and the initial endpoint and

angle of v are the same as those of r (Figure 1(b)). By

the properties of vector addition, v = v

+ v

′

. Since v

is the vector corresponding to the command transmit-

ted to the robot’s motors via the operator’s joystick,

is the unaltered vector generated by one of the be-

haviors, and the angle of v

′

is the same as that of the

other behavior’s vector v

, we can interpret v

′

as the

product of the vector v

and the desired weight, which

can be easily determined by dividing the magnitude of

′

by the magnitude of v

. That is, w

= ||v

′

||/||v

||,

= 1, and the ratio between the two is the resulting

The process described above is performed on each

of the recorded steps in turn, resulting in a weight

ratio for every time step during the demonstration.

While the computed ratios will vary throughout the

task, a single ratio will emerge as being the most con-

sistent with the entire task. We ﬁnd this by remov-

ing obvious outliers (ratios that are in the thousands

to hundreds of thousands range, generated when the

difference between the turn rate and one of the com-

ponent vectors is very close to zero, implying that for

that timestep, one vector is inﬁnitely stronger than the

other), and taking the average of the derived weights

over all the records.

5 EXPERIMENTAL RESULTS

To validate the proposed learning algorithm we per-

formed experiments using the Player/Stage simula-

tion environment (Gerkey et al., 2003) and a Pioneer

3DX mobile robot equipped with a SICK LMS-200

laser rangeﬁnder, two rings of sonars, and a pan-tilt-

zoom (PTZ) camera.

5.1 Robot Behaviors

The robot is equipped with the following primitive be-

haviors: laser obstacle avoidance (avoid), attraction to

a goal object (attract), attraction to unoccupied space

(wander), sonar obstacle avoidance (sonarAvoid), dis-

tinct from the avoid behavior due to differences in

sensor speciﬁcations, and random direction change

(random), all implemented using a potential ﬁelds ap-

proach.

For the experiments described in this paper we

only used the avoid, attract and wander primitives.

All of these behaviors use information from the laser

rangeﬁnder, which returns the distance between the

robot and any object the laser encounters, for each

unit of angular resolution. We have determined that

the maximum range at which obstacles should have

an effect on the robot’s behavior is 2 meters. If there

are no obstacles within this range, the value of 2 me-

ters is returned. Goal objects are represented as ﬁdu-

cials, which can be detected using combined infor-

mation from the laser and PTZ camera. The laser’s

ﬁducial detector has a range of 8 meters.

ROBOT LEARNING BY DEMONSTRATION USING FORWARD MODELS OF SCHEMA-BASED BEHAVIORS

265

Figure 2: Potential ﬁeld encoding of the following behav-

iors: (a) obstacle avoidance, (b) goal attract and (c) wander

Below we describe the implementation details of

the three behaviors that we used:

• Avoid. The avoid behavior is activated whenever

the laser rangeﬁnder returns a value within a distance

smaller than 2 meters. In this situation, for each laser

reading that reaches an obstacle, the behavior gener-

ates a vector whose angle is the bearing to the obstacle

plus 180 degrees and whose magnitude is equal to a

function of the distance between the robot and the ob-

stacle. We add 180 degrees to the angle to reverse the

direction of the vector, such that the robot is repulsed

by the obstacle. The function used to determine the

magnitude is 2/d − 1 if d is less than or equal to 2

and 0 if d is greater than 2, where d is the distance

to the obstacle and 2 is the maximum range (in me-

ters) at which obstacles can affect the robot’s behav-

ior. The resulting vectors are then combined through

vector addition to form one vector representing the re-

sponse of the avoid behavior.

• Wander. The wander behavior is activated when-

ever there are unoccupied spaces in the 180-degree

front ﬁeld of view of the robot. For each laser reading

that does not reach an obstacle, a vector is generated

whose angle is the bearing of that empty space and

whose magnitude is 1. These vectors are then com-

bined through vector addition to form one vector rep-

resenting the response of the wander behavior.

• Attract. The attract behavior is activated when-

ever a goal object (ﬁducial) is in the robot’s ﬁeld of

view. For each goal object that is detected, a vector

is generated whose angle is the bearing to the object

and whose magnitude is 1.

Within a particular controller, after these vectors

are generated, each one is multiplied by a scalar that

represents its relative importance compared to the

other behavior(s) and then they are combined through

vector addition. The resulting vector is then inter-

preted as a movement command and sent to the mo-

tors. This is done by translating the angle of the re-

sulting vector into rotational velocity. If the angle is

within the front 90 degrees of the robot, the robot will

move forward while turning. If the angle is within ei-

ther the left or right 90 degrees of the robot, the robot

will stop forward movement and turn in place until

the resulting vector is once again pointing in front of

the robot. If the angle of the resulting vector is within

the rear 90 degrees of the robot, the robot will reverse

while turning in the direction indicated.

The experiments we describe below validate the

learning algorithm’s ability to derive the weights of

these behaviors for various user-provided demonstra-

tions: different task achieving strategies are repre-

sented by different weights, which will be captured

by our proposed approach.

5.2 Experimental Setup

We tested the learning algorithm described above in

two scenarios, each of which used a different be-

havior pair: one with avoid/wander and one with

avoid/attract. For each scenario we performed two

sets of experiments, comprising the teaching by

demonstration and the validation stages respectively.

Each set was undertaken in a different “world,” de-

ﬁned by a map and the starting positions of both the

robot and a goal object (Figure 3). In the ﬁrst sce-

nario, the robot has to learn the task of navigating

safely in an environment with various sized corridors.

In the second scenario, the robot has to learn how

to reach a particular target, in an environment with

tighter or larger open spaces.

Figure 3: Experimental setup with possible paths: (a)

Avoid/wander experiment, (b) Avoid/Attract experiment

In the demonstration step of both sets of experi-

ments, a human operator uses a joystick to drive the

robot to the goal object while a background process

keeps track of the predictions of the component vec-

tors, the robot’s speed, and the robot’s turn rate at each

timestep, as described in Section 4. The user chooses

one of several paths to the goal, each of which rep-

resents a different strategy of performing the same

task. The different chosen paths have different effects

on the component behaviors, more speciﬁcally on the

weight that is placed on each behavior in turn. For ex-

ample, in the experiment where avoid and wander are

the two behaviors available, there are three different

paths to the goal, each slightly narrower than the last.

Ideally, if the user takes the path of medium width,

ICINCO 2005 - ROBOTICS AND AUTOMATION

266

then the robot should learn a set of behavior weights

that allows it to traverse paths of that width and larger,

while preventing it from entering narrower corridors.

Likewise, for the setup in which avoid and attract are

the behaviors the robot has available, there are two

possible paths to the goal: one which goes through

a narrow corridor and one which bypasses the corri-

dor altogether. In this case, it is hoped that if the user

drives the robot along the former path, the learning

algorithm will derive a weight for the attract vector

strong enough to compel the robot to traverse the cor-

ridor. Conversely, if the user bypasses the corridor, it

is hoped that the algorithm will derive a weight for the

attract vector that is too weak to overcome the contri-

bution from the avoid vector when faced with such a

tight space. The demonstration stops in both sets of

experiments when the user reaches the goal object.

The learning algorithm then analyzes the infor-

mation recorded in each of the demonstration runs.

For each timestep recorded, the algorithm derives a

weight based on the component vectors, the turn rate

and the speed of the robot for that timestep. The al-

gorithm produces a set of behavior weights, as de-

scribed in Section 4. While in our current implemen-

tation the learning is performed off-line, at the end of

the demonstration, the processing could also be per-

formed on-line, after each executed step.

To evaluate the performance of the learning algo-

rithm we place the robot at various locations in the en-

vironment and equip it with an autonomous controller

which uses the derived weights. If the robot performs

the same strategy as demonstrated by the user (e.g.,

does not navigate corridors or tight spaces narrower

than those through which the user drove it, but tra-

verses any wider spaces), the experiment is consid-

ered a success.

5.3 Results

To test the learning algorithm on the ﬁrst scenario

that involves the avoid/wander behavior pair, we per-

formed three separate experiments, one for each path

indicated in Figure 3 (a). Furthermore, each of these

experiments was repeated three times, resulting in the

nine values in the ﬁrst scenario portion of Table 1.

First, the user navigated the robot through the nar-

rowest path towards the goal. The result was a rel-

atively large weight for the wander behavior, as com-

pared with weights derived for wander in the next two

demonstrations (see Table 1). We then activated the

robot’s controller using the learned weights in three

different runs, starting from various initial positions:

one inside the narrow corridor, one inside the median-

width corridor, and one inside the wide corridor. The

controller allowed the robot to easily traverse the nar-

row corridor, as well as wider areas, since the weight

of wander was able to overcome the response of the

avoid behavior.

In a second demonstration, the user guided the ro-

bot through the middle-width path, which resulted in

a signiﬁcantly smaller weight for the wander behav-

ior (see Table 1). We tested the resulting controller

in three different runs, starting the robot in each of

the thee corridors (narrow, median-width and large).

When placing the robot on the narrowest path we

found that the weight of the wander behavior is no

longer strong enough to counter the effect of obstacle

avoidance, forcing the robot to reverse in an attempt to

escape from the constricting space. However, the ro-

bot was easily able to traverse the median-width cor-

ridor, as well as the largest one.

Finally, the user took the widest path for the third

demonstration, which resulted in a signiﬁcantly lower

weight for the wander behavior than in the previous

two runs (see Table 1). In the three experiments we

performed with the learned controller, the robot was

not able to traverse the narrow and median-width cor-

ridors, even when placed there, but was able to tra-

verse the widest corridor with ease.

Two subsequent repetitions of these experiments

(both learning and validation) led to similar re-

sults (with slight differences due to variability in

the user’s demonstration), leading to the conclusion

that the learning algorithm correctly derives the rela-

tive importance of the two component vectors in the

avoid/wander scenario and that it accurately captured

the strategy of the demonstrator.

Table 1: Behavior weights learned through demonstration

(Avoid weights kept constant at 1)

Wander vs. Avoid weight

First scenario

Exp. 1 Exp. 2 Exp. 3

Narrow corridor 7.8 14.4 8.4

Medium corridor 3.8 3.2 3.3

Wide corridor 0.4 0.6 0.4

Attract vs. Avoid weight

Second scenario

Exp. 1 Exp. 2 Exp. 3

Traverse corridor 195.5 215.2 195.6

Avoid corridor 124.0 131.9 118.7

The experiments for the second scenario, for which

the avoid/attract behavior pair was available to the

robot, followed similar lines and achieved similar re-

sults. In a ﬁrst demonstration, the user drove the robot

through the narrow corridor and directly to the goal.

This resulted in a weight for the attract behavior that

allowed the robot, in the ﬁrst validation run, to tra-

verse the corridor, even in the presence of obstacle

avoidance. In the second validation run, the robot was

again able to reach the goal when placed at an initial

position where it could see the goal, but was not sep-

arated from it by the narrow corridor.

ROBOT LEARNING BY DEMONSTRATION USING FORWARD MODELS OF SCHEMA-BASED BEHAVIORS

267

In the second demonstration, the user turns the ro-

bot away upon approaching the entrance to the nar-

row corridor, opting instead to lead the robot around

the obstacles from which it was constructed. As ex-

pected, this resulted in a much lower weight for the

attract behavior, which, when used in the robot’s con-

troller, caused the robot to stop at the entrance of

the corridor, where the magnitude of the avoid vector

generated by the obstacle walls became equal to that

of the attract vector generated by the goal at its other

end. When placed initially at a location where it could

see the goal without having to go through a corridor,

the robot was easily able to approach the goal object.

As with the avoid/wander behavior pair scenario, two

subsequent experiments demonstrated the repeatabil-

ity of these results, further conﬁrming our learning

algorithm’s viability in deriving relative weights of

pairs of behaviors using the potential ﬁelds method.

The trained controller is not restricted to a a par-

ticular path or execution sequence and is therefore

general enough to exhibit meaningful behavior in any

evaluation environment. For example, if a robot is

trained in corridors of 1 meter in width, the robot

will not enter any corridor that is narrower than 1

meter, regardless of the world in which it is placed

and without storing any explicit information about the

width of the corridor. The relative weights of the wan-

der and avoid behaviors will determine if the attrac-

tive force pulling the robot towards some corridor is

strong enough to overcome the repulsive force gener-

ated by the avoid behavior.

6 DISCUSSION

The approach we presented demonstrates the impor-

tance of considering concurrently running behaviors

as underlying mechanisms for achieving a task. The

method we proposed allows for learning of both the

goals involved in the task (e.g., reaching a target) and

also of the particular ways in which the same goals

can be achieved. The teacher’s demonstration pro-

vides essential information about the task, such as

what is more important: reaching the goal or stay-

ing away from obstacles. A purely sequential learn-

ing method would have identiﬁed the ultimate goal of

the task, but would have failed to capture the different

modalities in which the task could be achieved.

The proposed approach could also make a signiﬁ-

cant impact on modeling human behavior. Since the

algorithm allows for learning the level of importance

of various aspects of the task, it would enable under-

standing of the priorities different users have when

performing that task, thus capturing the underlying

strategies they employ. For example, the algorithm

could distinguish between bold strategies, in which

the user rushes through obstacles and toward the goal,

and cautious strategies in which the user goes slowly

around obstacles.

7 SUMMARY

We presented a method for robot task learning from

demonstration that addresses the problem of mapping

observations to robot behaviors from a novel perspec-

tive. Our claim, supported by biological inspiration,

is that motor behavior is typically expressed in

terms of concurrent control of multiple different

activities. Toward this end, we developed a learning

by demonstration approach that allows a robot to map

the demonstrator’s actions onto multiple behavior

primitives from its repertoire. This method has been

shown to capture not only the overall goals of the

task, but also the speciﬁcs of the user’s demonstration

which indicate different ways of executing the same

task.

ACKNOWLEDGEMENTS

This work has been supported by the Ofﬁce of Naval

Research under contract number N00014-03-1-0104

and by a UNR Junior Faculty Award to Monica Nico-

lescu.

REFERENCES

Aleotti, J., Caselli, S., and Reggiani, M. (2004). Leverag-

ing on a virtual environment for robot programming

by demonstration. Robotics and Autonomous Systems,

47:153–161.

Arbib, M. (1992). Schema theory. In S.Shapiro, editor, The

Encyclopedia of Artiﬁcial Intelligence, pages 1427–

1443. Wiley-Interscience.

Arkin, R. C. (1987). Motor schema based navigation for a

mobile robot: An approach to programming by behav-

ior. In IEEE Conference on Robotics and Automation,

1987, pages 264–271.

Arkin, R. C. (1998). Behavior-Based Robotics. MIT Press,

CA.

Gaussier, P., Moga, S., Banquet, J., and Quoy, M. (1998).

From perception-action loops to imitation processes:

A bottom-up approach of learning by imitation. Ap-

plied Artiﬁcial Intelligence Journal, 12(78):701–729.

Gerkey, B., Vaughan, R. T., and Howard., A. (2003). The

player/stage project: Tools for multi-robot and distrib-

uted sensor systems. In Proc., the 11th International

Conference on Advanced Robotics, pages 317–323.

ICINCO 2005 - ROBOTICS AND AUTOMATION

268

Hayes, G. and Demiris, J. (1994). A robot controller using

learning by imitation. In Proc. of the Intl. Symp. on In-

telligent Robotic Systems, pages 198–204, Grenoble,

France.

Matari

c, M. J. (1997). Behavior-based control: Examples

from navigation, learning, and group behavior. Jour-

nal of Experimental and Theoretical Artiﬁcial Intelli-

gence, 9(2–3):323–336.

Nicolescu, M. N. and Matari

c, M. J. (2003). Natural meth-

ods for robot task learning: Instructive demonstration,

generalization and practice. In Proc., Second Intl.

Joint Conf. on Autonomous Agents and Multi-Agent

Systems, Melbourne, Australia.

Schaal, S. (1997). Learning from demonstration. Advances

in Neural Information Processing Systems, 9:1040–

1046.

Schaal, S. (1999). Is imitation learning the route to

humanoid robots? Trends in Cognitive Sciences,

6(3):323–242.

Wolpert, D. and Kawato, M. (1998). Multiple paired for-

ward and inverse models for motor control. Neural

Networks, 11:1317–1329.

Wolpert, D., Miall, R., and Kawato, M. (1998). Internal

models in the cerebellum. Trends in Cognitive Sci-

ence, 2:338–347.

ROBOT LEARNING BY DEMONSTRATION USING FORWARD MODELS OF SCHEMA-BASED BEHAVIORS

269