ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS
Alexandre Donzé
2005
Abstract
This article proposes a general, intuitive and rigorous framework for designing temporal differences algorithms to solve optimal control problems in continuous time and space. Within this framework, we derive a version of the classical TD(λ) algorithm as well as a new TD algorithm which is similar, but designed to be more accurate and to converge as fast as TD(λ) for the best values of λ without the burden of finding these values.
References
- Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.
- Coulom, R. (2002). Reinforcement Learning Using Neural Networks, with Applications to Motor Control. PhD thesis, Institut National Polytechnique de Grenoble.
- Doya, K. (1996). Temporal difference learning in continuous time and space. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Systems, volume 8, pages 1073- 1079. The MIT Press.
- Doya, K. (2000). Reinforcement learning in continuous time and space. Neural Computation, 12(1):219-245.
- Munos, R. (2000). A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40(3):265-299.
- Munos, R. and Moore, A. (1999). Variable resolution discretization for high-accuracy solutions of optimal control problems. In International Joint Conference on Artificial Intelligence.
- Rantzer, A. (2005). On relaxed dynamic programming in switching systems. IEE Proceedings special issue on Hybrid Systems. Invited paper, to appear.
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, pages 1038-1044. MIT Press.
- Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- Tesauro, G. (1995). Temporal difference learning and TDGammon. Communications of the ACM, 38(3):58-68.
- Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3:59-72.
Paper Citation
in Harvard Style
Donzé A. (2005). ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS . In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 972-8865-29-5, pages 55-62. DOI: 10.5220/0001183700550062
in Bibtex Style
@conference{icinco05,
author={Alexandre Donzé},
title={ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS},
booktitle={Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2005},
pages={55-62},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001183700550062},
isbn={972-8865-29-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - ON TEMPORAL DIFFERENCE ALGORITHMS FOR CONTINUOUS SYSTEMS
SN - 972-8865-29-5
AU - Donzé A.
PY - 2005
SP - 55
EP - 62
DO - 10.5220/0001183700550062