Temporal-Difference Learning - An Online Support Vector Regression Approach

Hugo Tanzarella Teixeira; Celso Pascoli Bottura

doi:10.5220/0005572103180323

Temporal-Difference Learning - An Online Support Vector Regression Approach

Hugo Tanzarella Teixeira, Celso Pascoli Bottura

2015

Abstract

This paper proposes a new algorithm for Temporal-Difference (TD) learning using online support vector regression. It benefits from the good generalization properties support vector regression (SVR) has, and also can do incremental learning and automatically track variation of environment with time-varying characteristics. Using the online SVR we can obtain good estimation of value function in TD learning in linear and nonlinear prediction problems. Experimental results demonstrate the effectiveness of the proposed method by comparison with others methods.

References

Bellman, R. E. (1957). Dynamic programing. Princeton University Press, Princeton, NJ.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Nashua, NH, 1st edition.
Boyan, J. A. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233- 246.
Bus¸oniu, L., Babus?ka, R., Schutter, B. D., and Ernst, D. (2010). Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Inc., Boca Raton, FL, USA.
Cortes, C. and Vapnik, V. (1995). Support vector networks. Machine Learning, 20:273-297.
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods. Cambridge University Press, New York, NY, USA.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., and Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, (9):155-161.
Lewis, F. L. and Vrabie, D. (2009). Reinforcement learning and approximate dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 9(3):32-50.
Liu, D. and Zhang, H. (2005). A neural dynamic programming approach for learning control of failure avoidance problems. Intelligent Control and Systems, International Journal of, 10(1):21-32.
Ma, J., Theiler, J., and Perkins, S. (2003). Accurate online support vector regression. Neural Computation, 15(11):2683-2704.
Martin, M. (2002). On-line suport vector machines for function approximation. Software Department, Universitat Politècnica de Catalunya, Technical Report(LSI02-11-R):1-11.
Powell, W. B. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Hoboken, 2nd edition.
Schölkopf, B. and Smola, A. J. (2001). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA.
Smola, A. J. and Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3):199-222.
Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning, An Introduction. MIT Press, Cambridge, MA, USA, 1st edition.
Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Morgan & Claypool Publishers, Alberta, Canada.
Tsitsiklis, J. N. and Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690.
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer-Verlag, New York.
Wang, F. Y., Zhang, H., and Liu, D. (2009). Adaptive dynamic programming: An introduction. IEEE Computational Intelligence Magazine, 4(2):39-47.
Xu, X. (2006). A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning. Proceedings of the Second International Conference on Advances in Natural Computation, Part I:47-56.
Xu, X., gen He, H., and Hu, D. (2002). Efficient reinforcement learning using recursive least-squares methods. Artificial Intelligence Research, Journal of, 16:259- 292.
Xu, X., Zuo, L., and Huang, Z. (2014). Reinforcement learning algorithms with function approximation: Recent advances and applications. Information Sciences, 261:1-31.

Download

Paper Citation

in Harvard Style

Tanzarella Teixeira H. and Pascoli Bottura C. (2015). Temporal-Difference Learning - An Online Support Vector Regression Approach . In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-758-122-9, pages 318-323. DOI: 10.5220/0005572103180323

in Bibtex Style

@conference{icinco15,
author={Hugo Tanzarella Teixeira and Celso Pascoli Bottura},
title={Temporal-Difference Learning - An Online Support Vector Regression Approach},
booktitle={Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2015},
pages={318-323},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005572103180323},
isbn={978-989-758-122-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - Temporal-Difference Learning - An Online Support Vector Regression Approach
SN - 978-989-758-122-9
AU - Tanzarella Teixeira H.
AU - Pascoli Bottura C.
PY - 2015
SP - 318
EP - 323
DO - 10.5220/0005572103180323