LIGHT-WEIGHT REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATION FOR REAL-LIFE CONTROL TASKS
Kary Främling
2008
Abstract
Despite the impressive achievements of reinforcement learning (RL) in playing Backgammon already in the beginning of the 90’s, relatively few successful real-world applications of RL have been reported since then. This could be due to the tendency of RL research to focus on discrete Markov Decision Processes that make it difficult to handle tasks with continuous-valued features. Another reason could be a tendency to develop continuously more complex mathematical RL models that are difficult to implement and operate. Both of these issues are addressed in this paper by using the gradient-descent Sarsa() method together with a Normalised Radial Basis Function neural net. The experimental results on three typical benchmark control tasks show that these methods outperform most previously reported results on these tasks, while remaining computationally feasible to implement even as embedded software. Therefore the presented results can serve as a reference both regarding learning performance and computational applicability of RL for real-life applications.
References
- Abramson, M., Pachowicz, P., and Wechsler, H. (2003). Competitive reinforcement learning in continuous control tasks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Portland, OR, volume 3, pages 1909-1914.
- Albus, J. S. (1975). Data storage in the cerebellar model articulation controller (cmac). Journal of Dynamic Systems, Measurement and Control, September:228-233.
- Barto, A., Sutton, R., and Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. on Systems, Man, and Cybernetics, 13:835-846.
- Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12:219-245.
- Främling, K. (2004). Scaled gradient descent learning rate - reinforcement learning with light-seeking robot. In Proceedings of ICINCO'2004 conference, 25-28 August 2004, Setubal, Spain, pages 3-11.
- Främling, K. (2005). Adaptive robot learning in a nonstationary environment. In Proceedings of the 13th European Symposium on Artificial Neural Networks, April 27-29, Bruges, Belgium, pages 381-386.
- Främling, K. (2007a). Guiding exploration by pre-existing knowledge without modifying reward. Neural Networks, 20:736-747.
- Främling, K. (2007b). Replacing eligibility trace for actionvalue learning with function approximation. In Proceedings of the 15th European Symposium on Artificial Neural Networks, April 25-27, Bruges, Belgium, pages 313-318.
- Kimura, H. and Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Proceedings of the 15th Int. Conf. on Machine Learning, pages 278-286.
- Lagoudakis, M. G. and Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149.
- Mahadevan, S. and Maggioni, M. (2007). Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. J. Mach. Learn. Res., 8:2169-2231.
- Moore, A. (1991). Variable resolution dynamic programming. efficiently learning action maps in multivariate real-valued state-spaces. In Machine Learning: Proceedings of the Eight International Workshop, San Mateo, CA., pages 333-337. Morgan-Kaufmann.
- Santamaría, J., Sutton, R., and Ram, A. (1998). Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior, 6:163-217.
- Schaal, S. (1997). Learning from demonstration. In Advances in Neural Information Processing Systems (NIPS), volume 9, pages 1040-1046. MIT Press.
- Schaefer, A. M., Udluft, S., and Zimmermann, H.-G. (2007). The recurrent control neural network. In Proceedings of 15th European Symposium on Artificial Neural Networks, Bruges, Belgium, 25-27 April 2007, pages 319-324. D-Side.
- Schneegaß, D., Udluft, S., and Martinetz, T. (2007). Neural rewards regression for near-optimal policy identification in markovian and partial observable environments. In Proceedings of 15th European Symposium on Artificial Neural Networks, Bruges, Belgium, 25- 27 April 2007, pages 301-306. D-Side.
- Singh, S. and Sutton, R. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22:123-158.
- Smart, W. D. and Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the Seventeenth 17th International Conference on Machine Learning, pages 903-910. Morgan Kaufmann.
- Strens, M. J. and Moore, A. W. (2002). Policy search using paired comparisons. Journal of Machine Learning Research, 3:921-950.
- Sutton, R. and Barto, A. (1998). Reinforcement Learning. MIT Press, Cambridge, MA.
- Tesauro, G. (1995). Temporal difference learning and tdgammon. Communications of the ACM, 38:58-68.
- Whitehead, S. and Lin, L.-J. (1995). Reinforcement learning of non-markov decision processes. Artificial Intelligence, 73:271-306.
Paper Citation
in Harvard Style
Främling K. (2008). LIGHT-WEIGHT REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATION FOR REAL-LIFE CONTROL TASKS . In Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-8111-30-2, pages 127-134. DOI: 10.5220/0001484001270134
in Bibtex Style
@conference{icinco08,
author={Kary Främling},
title={LIGHT-WEIGHT REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATION FOR REAL-LIFE CONTROL TASKS},
booktitle={Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2008},
pages={127-134},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001484001270134},
isbn={978-989-8111-30-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - LIGHT-WEIGHT REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATION FOR REAL-LIFE CONTROL TASKS
SN - 978-989-8111-30-2
AU - Främling K.
PY - 2008
SP - 127
EP - 134
DO - 10.5220/0001484001270134