
5 RELATED WORK
The amount of literature on gradient descent
learning is abundant. One of the most recent and
exhaustive sources on the subject is (Haykin, 1999).
Adjusting the Adaline learning rate has been studied
previously at least by Luo (1991), who shows that
the Adaline learning rate should be reduced during
learning in order to avoid “cyclically jumping
around” the optimal solution. References in (Luo,
1991) also offer a good overview of research
concerning the gradient descent learning rate.
However, to the author’s knowledge, the concept of
scaled learning rate introduced in this paper is new.
RL has been used in many robotic tasks, but
most of them have been performed in simulated
environments. Only few results have been reported
on the use of RL on real robots. The experimental
setting used here resembles behavior learning
performed by Lin (1991) and Mahadevan & Connell
(1992). Behavioral tasks treated by them include
wall following, going through a door, docking into a
charger (guided by light sensors), finding boxes,
pushing boxes and getting un-wedged from stalled
states. Some of these behaviors are more challenging
than the light-seeking behavior used in this paper,
but the simple linear Adaline model used here for
state generalization greatly simplifies the learning
task compared to previous work. An example of
non-RL work on light-seeking robots is Lebeltel et
al. (2004).
6 CONCLUSIONS
One of the most important advantages of the scaled
learning rate presented in this paper is that it is easy
to understand the signification of the values used for
it. Evidence is also shown that the scaled learning
rate improves learning because it makes the network
output values approach the corresponding target
values with a similar amount independently of the
input values. Experimental results with a real-world
light-seeking robot illustrate the improvement in
learning results by using the scaled learning rate.
It seems rather surprising that a scaled learning
rate has not been used yet, according to the author’s
best knowledge. One explanation might be that in
supervised learning tasks, the training samples are
usually available beforehand, which makes it
possible to normalize them into suitable values. In
real-world RL tasks, with constraints on learning
time and the availability of training samples, this
may not be possible. Using multi-layer non-linear
ANNs also might reduce the utility of scaling the
learning rate, as explained in section 3.2.
In addition to the scaled learning rate, the RL
exploration/exploitation trade-off is also addressed
in the paper. The exploration policy used determines
the quality of collected training samples and
therefore greatly affects learning speed and the
quality of learned solutions. Empirical results are
shown mainly on the advantages of using optimistic
initial values for the network weights when possible.
Future work includes improving exploration
policies and handling delayed reward. Obtaining
further results on the use of the scaled learning rate
for other than RL tasks would also be useful.
ACKNOWLEDGEMENTS
I would like to thank Brian Bagnall for writing the
article “Building a Light-Seeking Robot with Q-
Learning“, published on-line on April 19 , 2002. It
gave me the idea to use the Lego Mindstorms kit and
his source code was of valuable help.
th
REFERENCES
Barto, A.G., Sutton, R.S., Watkins C.J.C.H. (1990).
Learning and Sequential Decision Making. In M.
Gabriel and J. Moore (eds.), Learning and
computational neuroscience : foundations of adaptive
networks. M.I.T. Press.
Boyan, J. A., Moore, A. W. (1995). Generalization in
Reinforcement Learning: Safely Approximating the
Value Function. In Tesauro, G., Touretzky, D., Leen,
T. (eds.), NIPS'1994 proc., Vol. 7. MIT Press, 369-
376.
Haykin, S. (1999). Neural Networks - a comprehensive
foundation. Prentice-Hall, New Jersey, USA.
Kaelbling, L.P., Littman, M.L., Moore, A.W. (1996).
Reinforcement Learning: A Survey. Journal of
Artificial Intelligence Research, Vol. 4, 237-285.
Lebeltel, O., Bessière, P., Diard, J., Mazer, E. (2004).
Bayesian Robot Programming. Autonomous Robots,
Vol. 16, 49-79.
Lin, L.-J. (1991). Programming robots using
reinforcement learning and teaching. In Proc. of the
Ninth National Conference on Artificial Intelligence
(AAAI), 781-786.
Luo, Z. (1991). On the convergence of the LMS algorithm
with adaptive learning rate for linear feedforward
networks. Neural Computation, Vol. 3, 226-245.
Mahadevan, S., Connell, J. (1992). Automatic
Programming of Behavior-based Robots using
Reinforcement Learning. Artificial Intelligence, Vol.
55, Nos. 2-3, 311-365.
ICINCO 2004 - INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION
10