10
0
10
1
10
2
10
3
0
0.5
1
1.5
2
Episodes
RMS error
Linear HopWorld
TD(λ)
LSTD(λ)
RLSTD(λ)
OSVR−TD
(a) Linear Hop-World
10
0
10
1
10
2
10
3
0
0.5
1
1.5
2
Episodes
RMS error
Nonlinear HopWorld
TD(λ)
LSTD(λ)
RLSTD(λ)
OSVR−TD
(b) Nonlinear Hop-World
Figure 5: Performance comparison between TD(λ),
LSTD(λ), RLSTD(λ) and OSVR-TD.
REFERENCES
Bellman, R. E. (1957). Dynamic programing. Princeton
University Press, Princeton, NJ.
Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic
Programming. Athena Scientific, Nashua, NH, 1st
edition.
Boyan, J. A. (2002). Technical update: Least-squares tem-
poral difference learning. Machine Learning, 49:233–
246.
Bus¸oniu, L., Babuˇska, R., Schutter, B. D., and Ernst,
D. (2010). Reinforcement Learning and Dynamic
Programming Using Function Approximators. CRC
Press, Inc., Boca Raton, FL, USA.
Cortes, C. and Vapnik, V. (1995). Support vector networks.
Machine Learning, 20:273–297.
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction
to Support Vector Machines: And Other Kernel-based
Learning Methods. Cambridge University Press, New
York, NY, USA.
Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A.,
and Vapnik, V. (1997). Support vector regression ma-
chines. Advances in neural information processing
systems, (9):155–161.
Lewis, F. L. and Vrabie, D. (2009). Reinforcement learn-
ing and approximate dynamic programming for feed-
back control. IEEE Circuits and Systems Magazine,
9(3):32–50.
Liu, D. and Zhang, H. (2005). A neural dynamic program-
ming approach for learning control of failure avoid-
ance problems. Intelligent Control and Systems, In-
ternational Journal of, 10(1):21–32.
Ma, J., Theiler, J., and Perkins, S. (2003). Accurate on-
line support vector regression. Neural Computation,
15(11):2683–2704.
Martin, M. (2002). On-line suport vector machines for func-
tion approximation. Software Department, Universi-
tat Polit`ecnica de Catalunya, Technical Report(LSI-
02-11-R):1–11.
Powell, W. B. (2011). Approximate Dynamic Program-
ming: Solving the Curses of Dimensionality. Wiley,
Hoboken, 2nd edition.
Sch¨olkopf, B. and Smola, A. J. (2001). Learning with Ker-
nels: Support Vector Machines, Regularization, Opti-
mization, and Beyond. MIT Press, Cambridge, MA,
USA.
Smola, A. J. and Sch¨olkopf, B. (2004). A tutorial on
support vector regression. Statistics and Computing,
14(3):199–222.
Sutton, R. S. (1988). Learning to predict by the method of
temporal differences. Machine Learning, 3:9–44.
Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learn-
ing, An Introduction. MIT Press, Cambridge, MA,
USA, 1st edition.
Szepesv´ari, C. (2010). Algorithms for Reinforcement
Learning. Morgan & Claypool Publishers, Alberta,
Canada.
Tsitsiklis, J. N. and Roy, B. V. (1997). An analysis of
temporal-difference learning with function approxi-
mation. IEEE Transactions on Automatic Control,
42(5):674–690.
Vapnik, V. N. (1995). The nature of statistical learning the-
ory. Springer-Verlag, New York.
Wang, F. Y., Zhang, H., and Liu, D. (2009). Adaptive dy-
namic programming: An introduction. IEEE Compu-
tational Intelligence Magazine, 4(2):39–47.
Xu, X. (2006). A sparse kernel-based least-squares tem-
poral difference algorithm for reinforcement learning.
Proceedings of the Second International Conference
on Advances in Natural Computation, Part I:47–56.
Xu, X., gen He, H., and Hu, D. (2002). Efficient reinforce-
ment learning using recursive least-squares methods.
Artificial Intelligence Research, Journal of, 16:259–
292.
Xu, X., Zuo, L., and Huang, Z. (2014). Reinforcement
learning algorithms with function approximation: Re-
cent advances and applications. Information Sciences,
261:1–31.
Temporal-DifferenceLearning-AnOnlineSupportVectorRegressionApproach
323