REINFORCEMENT LEARNING FOR ROBOT CONTROL USING PROBABILITY DENSITY ESTIMATIONS

Alejandro Agostini, Enric Celaya

2010

Abstract

The successful application of Reinforcement Learning (RL) techniques to robot control is limited by the fact that, in most robotic tasks, the state and action spaces are continuous, multidimensional, and in essence, too large for conventional RL algorithms to work. The well known curse of dimensionality makes infeasible using a tabular representation of the value function, which is the classical approach that provides convergence guarantees. When a function approximation technique is used to generalize among similar states, the convergence of the algorithm is compromised, since updates unavoidably affect an extended region of the domain, that is, some situations are modified in a way that has not been really experienced, and the update may degrade the approximation. We propose a RL algorithm that uses a probability density estimation in the joint space of states, actions and $Q$-values as a means of function approximation. This allows us to devise an updating approach that, taking into account the local sampling density, avoids an excessive modification of the approximation far from the observed sample.

References

  1. Arandjelovic, O. and Cipolla, R. (2005). Incremental learning of temporally-coherent gaussian mixture models. In Technical Papers - Society of Manufacturing Engineers (SME).
  2. Bellman, R. and Dreyfus, S. (1962). Applied Dynamic Programming. Princeton University Press, Princeton, New Jersey.
  3. Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
  4. Dearden, R., Friedman, N., and Russell, S. (1998). Bayesian q-learning. In In AAAI/IAAI, pages 761-768. AAAI Press.
  5. Dempster, A., Laird, N., Rubin, D., et al. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38.
  6. Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Comput., 12(1):219-245.
  7. Duda, R. O., Hart, P. E., and Stork, D. G. (2001). Pattern classification. John Wiley and Sons, Inc, New-York, USA.
  8. Engel, Y., Mannor, S., and Meir, R. (2005). Reinforcement learning with gaussian processes. In ICML 7805: Proceedings of the 22nd international conference on Machine learning, pages 201-208, New York, NY, USA. ACM.
  9. Ernst, D., Geurts, P., and Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. J. Mach. Learn. Res., 6:503-556.
  10. Figueiredo, M. (2000). On gaussian radial basis function approximations: Interpretation, extensions, and learning strategies. Pattern Recognition, International Conference on, 2:618-621.
  11. Ghahramani, Z. and Jordan, M. (1994). Supervised learning from incomplete data via an em approach. In Proceeding of Advances in Neural Information Processing Systems (NIPS'94), pages 120-127. San Mateo, CA: Morgan Kaufmann.
  12. Gordon, G. J. (1995). Stable function approximation in dynamic programming. In ICML, pages 261-268.
  13. Neal, R. and Hinton, G. (1998). A view of the em algorithm that justifies incremental, sparse, and other variants. In Proceedings of the NATO Advanced Study Institute on Learning in graphical models, pages 355-368, Norwell, MA, USA. Kluwer Academic Publishers.
  14. Nowlan, S. J. (1991). Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures. PhD thesis, Pittsburgh, PA, USA.
  15. Ormoneit, D. and Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49(2-3):161-178.
  16. Rasmussen, C. and Kuss, M. (2004). Gaussian processes in reinforcement learning. Advances in Neural Information Processing Systems, 16:751-759.
  17. Riedmiller, M. (2005a). Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. Lecture notes in computer science, 3720:317-328.
  18. Riedmiller, M. (2005b). Neural Reinforcement Learning to Swing-up and Balance a Real Pole. In Proceedings of the 2005 IEEE International Conference on Systems, Man and Cybernetics, volume 4, pages 3191-3196.
  19. Rottmann, A. and Burgard, W. (2009). Adaptive autonomous control using online value iteration with gaussian processes. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (ICRA'09), pages 2106-2111.
  20. Sato, M.-A. and Ishii, S. (2000). On-line em algorithm for the normalized gaussian network. Neural Comput., 12(2):407-432.
  21. Song, M. and Wang, H. (2005). Highly efficient incremental estimation of gaussian mixture models for online data stream clustering. In Proceedings of SPIE: Intelligent Computing: Theory and Applications III, pages 174- 183, Orlando, FL, USA.
  22. Sutton, R. and Barto, A. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
  23. Watkins, C. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4):279-292.
Download


Paper Citation


in Harvard Style

Agostini A. and Celaya E. (2010). REINFORCEMENT LEARNING FOR ROBOT CONTROL USING PROBABILITY DENSITY ESTIMATIONS . In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-8425-00-3, pages 160-168. DOI: 10.5220/0002949601600168


in Bibtex Style

@conference{icinco10,
author={Alejandro Agostini and Enric Celaya},
title={REINFORCEMENT LEARNING FOR ROBOT CONTROL USING PROBABILITY DENSITY ESTIMATIONS},
booktitle={Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2010},
pages={160-168},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002949601600168},
isbn={978-989-8425-00-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - REINFORCEMENT LEARNING FOR ROBOT CONTROL USING PROBABILITY DENSITY ESTIMATIONS
SN - 978-989-8425-00-3
AU - Agostini A.
AU - Celaya E.
PY - 2010
SP - 160
EP - 168
DO - 10.5220/0002949601600168