Sergio Ribeiro Augusto and Ademar Ferreira
Departamento de Engenharia de Telecomunicações e Controle, Escola Politécnica da Universidade de São Paulo
Av. Prof. Luciano Gualberto 3/380, 05508-900 Sao Paulo, Brazil
Keywords: Mobile robotics, supervised learning, radial basis function networks, teleoperation.
Abstract: This paper describes a mobile robotic platform and a software framework for applications and development
of robotic experiments integrating teleoperation and autonomy. An application using supervised learning is
developed in which the agent is trained by teleoperation. This allows the agent to learn the perception to
action mapping from the teleoperator in real time, such that the task can be repeated in an autonomous way,
with some generalization. A radial basis function network (RBF) trained by a sequential learning algorithm
is used to learn the mapping. Experimental results are shown.
In robotics navigation problems, including learning
or not, navigation techniques must be tested in real
robots to be useful (DORIGO, 1996). This is due to
the uncertainties involved, non uniformity of sensors
measurements and real time requirements. To deal
with these severe characteristics, this paper proposes
a mobile robotics platform developed in a modular
and hierarchical way, to be used in real time
autonomy studies. The objective is to create a
flexible development environment for studies in
which teleoperation can be easily integrated with
autonomous operation. The idea is to join
teleoperation with supervised learning in a way that
innate or prior knowledge can be acquired, or that an
agent can be taught to realize specific navigation
tasks. Such possibility allows a robotic agent to learn
with its own operation. Kaelbling (1996) points out
that without prior knowledge an agent can not learn
with effectiveness. Unsupervised learning
techniques, as for example reinforcement learning,
have a long convergence time and do not provide
operational agents from the beginning. Therefore, it
is important to mix such methods with supervised
ones (Ye et al., 2003; Er and Deng, 2005).
Although miniature like robots, as for instance
the Khepe
ra (Mondada et al., 1993), have been used
in studies and papers related to autonomous robotics,
as in Er and Deng (2005), it is more realistic to
perform the same experiments using larger robots
due to the dynamic effects associated, which places
them closer to real service robots. For this reason,
we decide to build a mobile robotic platform with
dynamic characteristics that could be applied in a
flexible way to navigation and learning experiments.
In this sense, the platform allows sensory-motor data
to be stored and recovered during or after operation,
and new sensors to be added and configured
according to the application.
Differently from Ye et al. (2003) and Er and
eng (2005), in our work the supervised learning
takes place in a real environment, not in a simulated
one, and in real time. The objective is to teach the
agent to perform simple navigation tasks using
ultrasound sensors.
In order to have incremental learning with some
ralization, a radial basis function neural network
(RBF) is developed. We adapted the resource
allocation algorithm proposed in Platt (1991) for the
function interpolation field, to obtain supervised
learning in real time, while the robot is teleoperated.
In this aspect, our work is also different from
Reignier et al. (1997), where the supervised learning
is off line, implemented in a GAL (“Grow and
Learn”) network, with results verified in simulation.
This paper is organized as follows. Section 2
bes the platform and the software framework
developed. Section 3 introduces the supervised
learning application. Section 4 presents some
experimental results that we got until now. Finally,
conclusions are drawn in Section 5.
Ribeiro Augusto S. and Ferreira A. (2007).
In Proceedings of the Fourth International Conference on Informatics in Control, Automation and Robotics, pages 390-395
DOI: 10.5220/0001622603900395
The robotic platform in its present version is
designed for indoor experiments. It measures 50 cm
(diameter) by 80 cm (height) and weights 40 kg. In
the sequence we describe some aspects of its
hardware and software architectures.
2.1 Hardware Architecture
A block diagram of the robotic system is shown in
Figure 1. The control of the robot locomotion is
accomplished by two motor wheels, powered
independently by two DC motors, using differential
steering (Dudek and Jenkin, 2000) and a caster
wheel. The platform has an image module, and
seven ultrasound sensors distributed in its frontal
side. The sensors are allocated in a way that objects
on the floor can be detected. A digital compass and
an angular sensor connected to the caster allow
sensorial integration techniques to be exploited to
assist in the navigation. Two incremental encoders
are used for odometry and velocity control. Collision
sensors protect the robot lower perimeter. Other
sensors can be added to the platform using a
synchronous serial interface available in the system,
so that the user can configure it to different types of
studies and experiments.
Figure 1: Block diagram of the robotic system.
The hardware architecture is arranged in
hierarchical processing modules, each one
responsible for some of the tasks involved in the
mobile robot control. There are four main modules:
the Management Module, the Motors Control
Module (MCM), the Power Module and the Sensors
Control Module (SCM). The Management Module is
responsible for the coordination of the robotic unit.
Currently a PC on board computer is used for this
function. It runs the software framework, described
in the section 2.2, for developing of user’s
applications. The Power Module is responsible for
the steering of the motor wheels under control of the
MCM. The Motors Control Module implements two
PID (proportional plus integral plus derivative)
controllers in parallel, allowing independent velocity
control of each wheel. The SCM module permits
acquiring data from the diverse sensors on the robot.
It has a synchronous serial interface (I2C) for
sensors expansion and it can also provide for
emergency stopping in case of collision.
The platform also has wireless TCP/IP
communication resources, allowing remote
monitoring, data exchange and teleoperation. The
energy system gives the robot at least one hour of
navigation’s autonomy.
2.2 Software Architecture
The software environment provides the robot with
autonomous navigation as well as teleoperation. The
software architecture is divided in two main
applications: The On Board Management Software
(OBMS), running in the Management Module of the
mobile platform, and the Remote Control and
Supervision Software (RCSS), executing in a remote
microcomputer. The communication between them
is made using the Client-Server paradigm (Andrews,
2000) and through the wireless network available in
the system.
A block diagram of the OBMS is shown in
Figure 2. The architecture is arranged in four main
levels: the Communication Level, the Management
and Supervision Level, the Execution Level and the
Software Interface.
The Communication Level implements a TCP/IP
server that is responsible for receiving commands
from the remote microcomputer and sending data
back to it. Simultaneous connections are possible
and data can be exchanged with more than one
remote microcomputer if desired. The Management
and Supervision Level deals with the commands
received at the TCP/IP server, interpreting and
executing them. This level also performs the
management of the mobile unit concerning its
operation mode, autonomous or teleoperated, which
is controlled through commands sent by the RCSS.
The effective control of the robot is made in the
Execution Level, which implements the operation
modes. This level is easily adapted to the application
required using a library of functions available to the
user. Each operation mode has a template which the
user can modify or adapt to his own necessities. In
the application described in this work, the learning
algorithm is added to the teleoperation mode and the
learned neural network is recovered and executed in
the autonomous mode. The Software Interface
isolates the hardware aspects of the robot creating an
application program interface (API). This permits
that hardware modifications can be made without
any change in the other levels of the architecture,
supplying modularity.
The software framework has a multithreaded
architecture (Andrews, 2000), which is adequate to
implement real time applications.
Figure 2: Block diagram of the OBMS.
Sensory-motor data are stored in a data base for
analysis and utilization. Sensory-motor coordination
aspects (Pfeifer and Scheier, 1997) can then be
exploited in the training and learning of autonomous
agents. A global data structure allows data exchange
among the several software modules in execution.
The RCSS has the main objective of informing
the OBMS concerning the operation mode requested
by the user. In the teleoperation mode, the robot is
controlled through a joystick connected to the
remote microcomputer. A TCP/IP client in the
RCSS communicates with the OBMS allowing
messages and commands exchange.
Using the facilities of the platform, a supervised
learning application, assisted by teleoperation, was
developed. An RBF neural network was trained in a
sequential way appropriate to real time applications.
The network starts with no computational units and
grows by allocating units (hidden units), or centers,
based on the “novelty” of an observation. The
novelty is characterized by two joint criterions: the
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
distance criterion and the prediction error criterion.
The former is based on the distance between the
input pattern observed and the network units. The
latter uses the errors among the desired outputs and
the network ones due to the input pattern. The
network forms a compact representation and it has a
quick learning. Learning patterns do not have to be
repeated. The units only respond to a local region of
the space of the input values, making easy the
incremental learning. If a new pattern is presented to
the network and the joint criterion is satisfied, a new
unit is allocated. Else, the network parameters are
update using the LMS (Least Mean Square)
algorithm. Instead of using LMS, an algorithm based
on the extended Kalman filter (EKF) has been
proposed in the literature (Kadirkamanathan and
Niranjan, 1993) to speed up the convergence of the
network. Because of the computational complexity
involved in the EKF, requiring longer processing
time, we decided to use LMS in our real time
application. The experimental results in section 4
show that our choice was sufficient for the
navigation tests that were realized, allowing the
training of the robot to the tasks proposed. It is not
our objective in this work to minimize the number of
teleoperations for learning, so the speed of
convergence of the network is not our main
In our proposal, teleoperations are used for
training an RBF network that has the seven
ultrasound sensors of the platform as input pattern
and the angular velocities of the two motor wheels
as outputs. The network parameters are updated in
real time during the teleoperation and stored in the
end of the training. A new teleoperation can be made
with the network starting with the stored parameters.
The autonomous mode implements the learned
network in such a way that the robot can repeat the
task with some generalization. This means that the
robot produces coherent outputs for similar inputs,
although not equal to those encountered during
training. If the performance is not good, the platform
allows that new teleoperations can be made, starting
from the parameters that have already been learned.
The learning algorithm, adapting from Platt
(1991), Kadirkamanathan and Niranjan (1993), is
described mathematically as follows, where: x(n) is
the input pattern at the instant n; y
(n) is the angular
velocity desired for each wheel ( j=1,2); s
(n) are the
network outputs; u
is the unit k of the network; w
is the weight connecting the unit k to the output j;
ε(n) is the value of the distance threshold in the
iteration n that inserts a new unit; ε
and ε
respectively the maximum and minimum values of
is the threshold to the network prediction
error; k
is the overlap factor to the network units; γ
(0 < γ < 1) is a decay constant and u
is the nearest
center to the input x(n). ε
and ε
represent the
scale of resolution in the input space, respectively
the largest and the smallest scale of interest.
The network outputs are written as:
))(()1()( x
where m is the number of units or centers of the
network, Ф
(x(n)) = 1 for k = 0 and
for k 0.
In the first iteration (n = 0):
ε(n) = ε
, w
(n) = y
(n) (k = 0)
For each observation (x(n); y
(n), y
(n) = y
(n) – s
If |e
(n)| (
) > e
and ||x
|| > ε(n)
Allocate a new unit:
= x(n)
= e
If it is the first unit
= k
= k
Else execute LMS
ε(n) = max{ ε
, ε
The updating of the network parameters in
accordance with the LMS algorithm is given by
equations (3) and (4). The time index n was omitted
for clarity. Equation (3) is the correction term for the
component i of each center k. The dimension of each
unit is d, the same of the input pattern (i = 1, 2, …,
d). The correction to the weights is given by
equation (4). In both equations η is the learning rate.
sywuxu x
syw Φ=Δ
Experiments were performed to verify the robot
ability to learn the perception to action mapping
from the teleoperator in real time, checking if the
mobile unit can repeat the navigation task
autonomously. The parameters values used in the
tests were: ε
= 0.03, ε
= 0.5, γ = 0.9, η = 0.3 and
= 0.5. The input and output data used to train the
network were normalized in the range [-1, 1].
Figure 3 shows the results of teleoperating the robot
in a corridor with a turn to the right.
Figure 3: Navigating in a corridor with a turn to the right.
The training consisted of three teleoperations,
each one starting at a different initial position. In the
figure, the solid lines represent the trajectories
executed during the training and the dashed ones are
the routes performed autonomously by the robot.
The trajectory is the path traversed by the medium
point between the two motor wheels. It should be
remembered that the unit has a diameter of 50 cm.
The training resulted in a network with 121 units.
The idea of the experiments was to demonstrate
the capacity of the method in acquiring reactive
movements such as wall-following and obstacle
avoidance. In Figure 4 the robot goes around an
obstacle. Four teleoperations were realized for
training, resulting in a network with 119 centers.
After the learning phase the autonomous mode was
activated and the agent executed the task with
The agent was also trained to make a path in the
shape of an 8 around two obstacles. Figure 5 shows
the test environment.
Figure 4: Robot avoiding an obstacle.
Figure 5: Robot in the test environment.
A few teleoperations were realized to train the
robot in this navigation task. In the autonomous
mode, it was observed that the robot continued
surrounding one of the obstacles instead of
completing the 8 when arriving at the trajectory
cross point. We then added the value of the digital
compass available in the robotic platform to the
network input pattern, in a way that the agent could
infer the direction of the movement during the
learning. Such alteration allowed the robot to
complete the task autonomously. The initial training
allocated 262 units in the network. Sometimes the
robot did not complete the navigation task by itself
with success. In those cases the autonomous mode
was finished and the agent was teleoperated,
completing the route. The incremental and local
learning characteristics of the neural network
allowed new units to be added to the network
encompassing such situations. In the end a neural
network with 935 units resulted.
In Figure 6 we have some training trajectories
(solid lines) and some paths realized by the robot
when operating in the autonomous mode (dash and
dash-dot lines).
ICINCO 2007 - International Conference on Informatics in Control, Automation and Robotics
Figure 6: Robot making a path in the shape of an 8.
The mobile robotic platform proposed showed to be
efficient to the realization of teleoperated as well as
autonomous experiments and studies. The
navigation results obtained with the sequential and
local learning algorithm used are promising. The
results exhibited some generalization, although no
specific experiment has yet been made to verify that
more systematically. The technique can be applied,
for instance, to get prior learning in reactive robotic
applications, speeding up real time learning.
As future work, real time pruning techniques
should be developed and added to the algorithm to
minimize the number of units in the neural network.
Andrews, G. R. (2000). Foundations of Multithreaded,
Parallel, and Distributed Programming, Addison-
Wesley. New York.
Dorigo, M. (1996). Introduction to the Special Issue on
Learning Autonomous Robots. IEEE Trans. on
Systems, Man, and Cybernetics, Part-B, vol. 26, no. 3,
pp. 361-364.
Dudek G. and Jenkin, M. (2000). Computational
Principles of Mobile Robotics, Cambridge University
Press. Cambridge.
Er, M. J. and Deng, C. (2005). Obstacle Avoidance of a
Mobile Robot Using Hybrid Learning Approach. IEEE
Trans. on Industrial Electronics, vol. 52, no. 3, pp.
Kadirkamanathan, V. and Niranjan, M. (1993). A Function
Estimation Approach to Sequential Learning with
Neural Networks. Neural Computation, vol. 5, pp.
Kaelbling, L. P. (1996). On Reinforcement Learning for
Robots. In Proc. of IEEE Int. Conf. on Intelligent
Robots and Systems, Osaka, Japan, pp. 1319-1320.
Mondada, R., Franzi, E. and Ienne, P. (1993). Mobile
Robot Miniaturization: A Tool for Investigation in
Control Algorithms. In Proc. of the Third
International Symposium on Experimental Robots,
Kyoto, Japan, pp. 501-513.
Pfeifer, R. and Scheier, C. (1997). Sensory-motor
coordination: The metaphor and beyond. Robotics and
Autonomous Systems, vol. 19, pp. 244-257.
Platt, J. C. (1991). A Resource-Allocating Network for
Function Interpolation. Neural Computation, vol. 3,
no. 2, pp. 213-225.
Reignier, P., Hanser, V. and Crowley, J. L. (1997).
Incremental Supervised Learning for Mobile Robot
Reactive Control. Robotics and Autonomous Systems,
vol. 20, pp. 157-178.
Ye C., Yung, N. H. C. and Wang, D. (2003). A Fuzzy
Controller with Supervised Learning Assisted
Reinforcement Learning Algorithm for Obstacle
Avoidance. IEEE Trans. on Systems, Man, and
Cybernetics, Part-B, vol. 33, no.1, pp. 17-27.