STATE AGGREGATION FOR REINFORCEMENT LEARNING USING NEUROEVOLUTION

Robert Wright, Nathaniel Gemelli

Abstract

In this paper, we present a new machine learning algorithm, RL-SANE, which uses a combination of neuroevolution (NE) and traditional reinforcement learning (RL) techniques to improve learning performace. RL-SANE is an innovative combination of the neuroevolutionary algorithm NEAT(Stanley, 2004) and the RL algorithm Sarsa(l)(Sutton and Barto, 1998). It uses the special ability of NEAT to generate and train customized neural networks that provide a means for reducing the size of the state space through state aggregation. Reducing the size of the state space through aggregation enables Sarsa(l) to be applied to much more difficult problems than standard tabular based approaches. Previous similar work in this area, such as in Whiteson and Stone (Whiteson and Stone, 2006) and Stanley and Miikkulainen (Stanley and Miikkulainen, 2001), have shown positive and promising results. This paper gives a brief overview of neuroevolutionary methods, introduces the RL-SANE algorithm, presents a comparative analysis of RL-SANE to other neuroevolutionary algorithms, and concludes with a discussion of enhancements that need to be made to RL-SANE.

References

  1. Boyan, J. A. and Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 369-376, Cambridge, MA. The MIT Press.
  2. Carreras, M., Ridao, P., Batlle, J., Nicosebici, T., and Ursulovici, Z. (2002). Learning reactive robot behaviors with neural-q learning. In IEEE-TTTC International Conference on Automation, Quality and Testing, Robotics. IEEE.
  3. Gomez, F. J. and Miikkulainen, R. (1999). Solving nonmarkovian control tasks with neuro-evolution. In IJCAI 7899: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pages 1356-1361, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  4. James, D. and Tucker, P. (2004). A comparative analysis of simplification and complexification in the evolution of neural network topologies. In Proceedings of the 2004 Conference on Genetic and Evolutionary Computation. GECCO-2004.
  5. Moriarty, D. E. and Miikkulainen, R. (1997). Forming neural networks through efficient and adaptive coevolution. Evolutionary Computation, 5:373-399.
  6. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1988). Learning representations by back-propagating errors. Neurocomputing: foundations of research, pages 696-699.
  7. Siebel, N. T., Krause, J., and Sommer, G. (2007). Efficient Learning of Neural Networks with Evolutionary Algorithms, volume Volume 4713/2007. Springer Berlin / Heidelberg, Heidelberg, Germany.
  8. Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pages 361-368. The MIT Press.
  9. Stanley, K. O. (2004). Efficient evolution of neural networks through complexification. PhD thesis, The University of Texas at Austin. Supervisor-Risto P. Miikkulainen.
  10. Stanley, K. O. and Miikkulainen, R. (2001). Evolving neural networks through augmenting topologies. Technical report, University of Texas at Austin, Austin, TX, USA.
  11. Stanley, K. O. and Miikkulainen, R. (2002). Efficient reinforcement learning through evolving neural network topologies. In GECCO 7802: Proceedings of the Genetic and Evolutionary Computation Conference, pages 569-577, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  12. Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). The MIT Press.
  13. Tesauro, G. (1995). Temporal difference learning and tdgammon. Commun. ACM, 38(3):58-68.
  14. Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4):279-292.
  15. Whiteson, S. and Stone, P. (2006). Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7:877-917.
Download


Paper Citation


in Harvard Style

Wright R. and Gemelli N. (2009). STATE AGGREGATION FOR REINFORCEMENT LEARNING USING NEUROEVOLUTION . In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-8111-66-1, pages 45-52. DOI: 10.5220/0001658800450052


in Bibtex Style

@conference{icaart09,
author={Robert Wright and Nathaniel Gemelli},
title={STATE AGGREGATION FOR REINFORCEMENT LEARNING USING NEUROEVOLUTION},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2009},
pages={45-52},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001658800450052},
isbn={978-989-8111-66-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - STATE AGGREGATION FOR REINFORCEMENT LEARNING USING NEUROEVOLUTION
SN - 978-989-8111-66-1
AU - Wright R.
AU - Gemelli N.
PY - 2009
SP - 45
EP - 52
DO - 10.5220/0001658800450052