cess (learning). For the SCARA robot 67 episodes
are needed as average to find a solution, obtaining a
best case after 23 episodes.
0 10 20 30 40 50 60 70 80 90 100
0
50
100
150
200
250
300
350
400
Episodes
Steps
Learning curve
Figure 9: Learning curve for the SCARA robot model for
100 episodes.
Fig. 10 shows a trace of the behavior of the
SCARA robot for four consecutive goals, in this pic-
ture the same oscillations can be seen as in the trace
of the three-link planar robot due to the zero-crossing
effect explained above.
−5
0
5
10
15
20
25
30
35
40
−10 −5 0 5 10 15 20 25 30 35 40
1
2
3
4
Figure 10: Trace for the SCARA robot for four consecutive
goals.
6 CONCLUSIONS
A distributed approach to RL in robot control tasks
has been presented. To verify the method we have de-
fined an experimental framework and we have tested
it on two well-known robotic manipulators: a three-
link planar robot and an industrial SCARA.
The experimental results in learning a control
policy for diverse kind of multi-link robotic models
clearly shows that it is not necessary that the indi-
vidual agents perceive the complete state space in or-
der to learn a good global policy but only a reduced
state space directly related to its own environmental
experience. Also we have shown that the proposed
architecture combined with the use of continuous re-
ward functions results in an impressive improvement
of the learning speed making tractable some learn-
ing problems in which a classical RL with discrete
rewards (−1, 0, 1) does not work. Also we want to
adapt the proposed method to snake-like robots. The
main drawback in this case could be the absence of a
base which fixes the robot to the environment and its
implication in the definition of goal positions.
ACKNOWLEDGEMENTS
This work has been partially funded by the Span-
ish Ministry of Science and Technology, project
DPI2006-15346-C03-02.
REFERENCES
El-Fakdi, A., Carreras, M., and Ridao, P. (2005). Direct
gradient-based reinforcement learning for robot be-
havior learning. In ICINCO 2005, pages 225–231.
INSTICC Press.
Franklin, J. A. (1988). Refinement of robot motor skills
through reinforcement learning. In Proc. of 27
th
Conf.
on Decision and Control, pages 1096–1101, Austin,
Texas.
Kalmar, Szepesvari, and Lorincz (2000). Modular rein-
forcement learning: A case study in a robot domain.
ACTACYB: Acta Cybernetica, 14.
Kretchmar, R. M. (2000). A synthesis of reinforcement
learning and robust control theory. PhD thesis, Col-
orado State University.
Lin, L.-J. (1993). Scaling up reinforcement learning for
robot control. Machine Learning.
Martin-H., J. A. and De-Lope, J. (2006). Dynamic goal co-
ordination in physical agents. In ICINCO 2006, pages
154–159. INSTICC Press.
Mataric, M. J. (1997). Reinforcement learning in the multi-
robot domain. Auton. Robots, 4(1):73–83.
Rubo, Z., Yu, S., Xingoe, W., Guangmin, Y., and Guochang,
G. (2000). Research on reinforcement learning of the
intelligent robot based on self-adaptive quantization.
In Proc. of the 3rd World Congr. on Intelligent Con-
trol and Automation. IEEE, Piscataway, NJ, USA, vol-
ume 2, pages 1226–9.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-
ing, An Introduction. The MIT press.
Tsitsiklis, J. N. and Roy, B. V. (1996). Analysis of temporal-
diffference learning with function approximation. In
Mozer, M., Jordan, M. I., and Petsche, T., editors,
NIPS, pages 1075–1081. MIT Press.
Venturini, G. (1994). Adaptation in dynamic environments
through a minimal probability of exploration. In
SAB94, pages 371–379, Cambridge, MA, USA. MIT
Press.
Watkins, C. J. (1989). Models of Delayed Reinforcement
Learning. PhD thesis, Psychology Department, Cam-
bridge University, Cambridge, United Kingdom.
Watkins, C. J. and Dayan, P. (1992). Technical note Q-
learning. Machine Learning, 8:279.
Yamada, S., Watanabe, A., and Nakashima, M. (1997).
Hybrid reinforcement learning and its application to
biped robot control. In NIPS. The MIT Press.
A DISTRIBUTED REINFORCEMENT LEARNING CONTROL ARCHITECTURE FOR MULTI-LINK ROBOTS -
Experimental Validation
197