
quantity of inputs or sensors that has the robot. It
allows connect only three sensors at the same time,
limiting the functionality of the robots and the
complexity of the program that is implemented.
Also, it possesses a limited memory (32Kb), what do
not allow implement very extensive programs. Also,
the firmware, occupies space in the memory,
diminishing even more the capacity to store user's
programs.
The navigation problem to find a goal is possible
to solve using a reinforcement learning algorithm.
Other types of learning techniques exist, as the
evolutionary learning that involves the use of genetic
algorithms and that can be used to solve navigation
problems. These algorithms are slow and they only
allow approximate to sub-optimal solution, without
any guarantee of the convergence to the best
solution. They do not operate on real time execution,
being a restrictive factor for the implementation of
this type of problem.
Value Iteration, as Q-Learning, is based on the
principle of the dynamic programming, which allows
carry out an effective learning through recompenses
and penalizations. The advantage of using Value
Iteration is that the found solution tends to be, in
most of the cases, better than the solution found with
Q-Learning.
The main disadvantage of the Value Iteration
algorithm is that it only can be applied in action-state
schemes where an action taken in a given state, leads
the same successor state. If the goal changes
position, the algorithm takes much more time, since
all the roads that have been taken and then learned
by the robot, these will always be taken even still
when the goal is located in some other point.
With regard to Q-Learning, the main advantages
of using this algorithm are that if the goal changes
position, Q-Learning adjusts its learning efficiently,
being able to always arrive to the goal. Besides if the
Q-Matrix is initialized with random values, the robot
can experience other roads that enlarge his learning.
This is because the algorithm looks always for the
maximum Q-value for all possible actions in a state.
Finally, the learning curve tends to converge in
quicker way that Value Iteration, although in a less
uniform way.
The main disadvantage of using Q-learning is
that the solution found is not always the best one,
although it tends to be very near of it.
In this point one could wonder under what
conditions one of these two reinforcement learning
algorithms should be chosen? The answer comes
given depending on the situation. If the time to
arrive to the goal is not a problem, you can apply
Value Iteration, otherwise Q-Learning is the best
option. However, for both cases, the route that finds
Value Iteration is similar to the route that finds with
Q-Learning, varying very little with respect to the
other one.
When programs are designed based on
reinforcement learning algorithms, it is necessary to
define and to design in a detailed way the states,
actions and the reward policy, since these factors
play a very important role in the operation of the
algorithm. If some of these factors flaw, the acting of
the algorithms can be seriously affected, or worse, it
could not arrive to any solution.
REFERENCES
Bagnall, B. 2002, CORE Lego MindstormsTM
Programming. Edit. Prentice Hall PTR. New York,
United States of America.
Ferrari, G., Gombos, A. Hilmer, S., Stuber, J., Porter, M.,
Waldinger, J. and Laverde, D. 2002, Programming
Lego MindstormsTM with Java. Edit. Sysgress.
United States America.
Russel, S. and Norvig, P. 1996, Artificial Intelligence A
modern approach. First edition. Edit. Prentice Hall,
1996.
Carrasquero Z., Oscar H. and McMaster F., Eduardo 2002,
Design and a robot's construction with the module
RCX 1.0 for not predetermined worlds (Thesis of
Engineer in Computer, Catholic University Andrés
Bello).
Bagnall, Bryan 2001, LejOS: Java for the RCX. [web page
on-line] Consulted in 10/5/2003 Available in
http://lejos.sourceforge.net /.
Dartmouth College Computer Science Department (2001).
Robbery-Rats Locomotion: Dual Differential Drive.
[web page on-line] Consulted in 20/5/2003 Available
in
http://www.cs.dartmouth.edu/~robotlab/robotlab/cour
ses/cs54-2001s/dualdiff.html
Gross M., Stephan V. and Boehme J. 1996, Sensory based
robot navigation using self-organizing networks and
Q-Learning. [document on-line] Consulted in
12/9/2003 Available in
http://citeseer.nj.nec.com/gross96sensorybased.htm
Mance E. Harmon and Stephanie S. Harmon (s.f.).
Reinforcement Learning: A Tutorial. [ document on-
line] Consulted in 20/9/2003 Available in
http://www.nada.kth.se/kurser/kth/2D1432/2003/rltut
orial.pdf.
Touzet, Claude F. 1999, Neural Networks and Q-Learning
for Robotics. [document on-line] Consulted in
3/9/2003 Available in
http://avalon.epm.ornl.gov/~touzetc/Publi/Touzet_IJC
NN_Tut.pdf.
Zou Yi, Ho Yeong Khing, Chua Chin Seng and Zhou Xiao
Wei 2003, Evidence method in ultrasonic sensor
coalition for mobile robots. [document on-line]
Consulted in 10/8/2003 Available in
http://www.duke.edu/~yz21/research/YZ_IASTED-
MIC2000.pdf.
AUTONOMOUS NAVIGATION ROBOTIC SYSTEM TO RECOGNIZE IRREGULAR PATTERNS
301