examine the scalability of RL-SANE and the other
neuroevolutionary based algorithms, such as NEAT,
NEAT+Q, and EANT (Siebel et al., 2007), to find out
just how far these algorithms can be pushed.
6 CONCLUSIONS
In this paper, we have introduced the RL-SANE al-
gorithm, explored its performance under varying β
values, and provided a comparative analysis to other
neuroevolutionary learning approaches. Our exper-
imental results have show that RL-SANE is able to
converge to good solutions over less iterations and
with less computational expense than NEAT even
with naively specified β values. The combination of
neuroevolutionary methods to do state aggregation for
traditional reinforcement learning algorithms appears
to have real merit. RL-SANE is, however, dependent
on the β parameter which must be calculated a priori.
We have shown the importance of the derivation of
proper β parameters and suggested finding methods
for automating the derivation of β as a direction for
future research.
Building off of what has been done by previ-
ous neuroevolutionary methods, we have found that
proper decomposition of the problem into state aggre-
gation and policy iteration is relevant. By providing
this decomposition, RL-SANE should be more appli-
cable to higher complexity problems than existing ap-
proaches.
REFERENCES
Boyan, J. A. and Moore, A. W. (1995). Generalization in re-
inforcement learning: Safely approximating the value
function. In Tesauro, G., Touretzky, D. S., and Leen,
T. K., editors, Advances in Neural Information Pro-
cessing Systems 7, pages 369–376, Cambridge, MA.
The MIT Press.
Carreras, M., Ridao, P., Batlle, J., Nicosebici, T., and Ur-
sulovici, Z. (2002). Learning reactive robot behav-
iors with neural-q learning. In IEEE-TTTC Interna-
tional Conference on Automation, Quality and Test-
ing, Robotics. IEEE.
Gomez, F. J. and Miikkulainen, R. (1999). Solving non-
markovian control tasks with neuro-evolution. In IJ-
CAI ’99: Proceedings of the Sixteenth International
Joint Conference on Artificial Intelligence, pages
1356–1361, San Francisco, CA, USA. Morgan Kauf-
mann Publishers Inc.
James, D. and Tucker, P. (2004). A comparative analysis
of simplification and complexification in the evolution
of neural network topologies. In Proceedings of the
2004 Conference on Genetic and Evolutionary Com-
putation. GECCO-2004.
Moriarty, D. E. and Miikkulainen, R. (1997). Forming neu-
ral networks through efficient and adaptive coevolu-
tion. Evolutionary Computation, 5:373–399.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J.
(1988). Learning representations by back-propagating
errors. Neurocomputing: foundations of research,
pages 696–699.
Siebel, N. T., Krause, J., and Sommer, G. (2007). Efficient
Learning of Neural Networks with Evolutionary Algo-
rithms, volume Volume 4713/2007. Springer Berlin /
Heidelberg, Heidelberg, Germany.
Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Re-
inforcement learning with soft state aggregation. In
Tesauro, G., Touretzky, D., and Leen, T., editors,
Advances in Neural Information Processing Systems,
volume 7, pages 361–368. The MIT Press.
Stanley, K. O. (2004). Efficient evolution of neural networks
through complexification. PhD thesis, The University
of Texas at Austin. Supervisor-Risto P. Miikkulainen.
Stanley, K. O. and Miikkulainen, R. (2001). Evolving neu-
ral networks through augmenting topologies. Techni-
cal report, University of Texas at Austin, Austin, TX,
USA.
Stanley, K. O. and Miikkulainen, R. (2002). Efficient
reinforcement learning through evolving neural net-
work topologies. In GECCO ’02: Proceedings of the
Genetic and Evolutionary Computation Conference,
pages 569–577, San Francisco, CA, USA. Morgan
Kaufmann Publishers Inc.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-
ing: An Introduction (Adaptive Computation and Ma-
chine Learning). The MIT Press.
Tesauro, G. (1995). Temporal difference learning and td-
gammon. Commun. ACM, 38(3):58–68.
Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Ma-
chine Learning, 8(3-4):279–292.
Whiteson, S. and Stone, P. (2006). Evolutionary function
approximation for reinforcement learning. Journal of
Machine Learning Research, 7:877–917.
ICAART 2009 - International Conference on Agents and Artificial Intelligence
52