man, T., editors, Advances in Neural Information Pro-
cessing Systems 19, pages 1–8, Cambridge, MA. MIT
Press.
Abramson, M., Pachowicz, P., and Wechsler, H. (2003).
Competitive reinforcement learning in continuous
control tasks. In Proceedings of the International
Joint Conference on Neural Networks (IJCNN), Port-
land, OR, volume 3, pages 1909–1914.
Albus, J. S. (1975). Data storage in the cerebellar model ar-
ticulation controller (cmac). Journal of Dynamic Sys-
tems, Measurement and Control, September:228–233.
Barto, A., Sutton, R., and Anderson, C. (1983). Neuron-
like adaptive elements that can solve difficult learning
control problems. IEEE Trans. on Systems, Man, and
Cybernetics, 13:835–846.
Doya, K. (2000). Reinforcement learning in continuous
time and space. Neural Computation, 12:219–245.
Främling, K. (2004). Scaled gradient descent learning rate
- reinforcement learning with light-seeking robot. In
Proceedings of ICINCO’2004 conference, 25-28 Au-
gust 2004, Setubal, Spain, pages 3–11.
Främling, K. (2005). Adaptive robot learning in a non-
stationary environment. In Proceedings of the 13
th
European Symposium on Artificial Neural Networks,
April 27-29, Bruges, Belgium, pages 381–386.
Främling, K. (2007a). Guiding exploration by pre-existing
knowledge without modifying reward. Neural Net-
works, 20:736–747.
Främling, K. (2007b). Replacing eligibility trace for action-
value learning with function approximation. In Pro-
ceedings of the 15
th
European Symposium on Artifi-
cial Neural Networks, April 25-27, Bruges, Belgium,
pages 313–318.
Kimura, H. and Kobayashi, S. (1998). An analysis of
actor/critic algorithms using eligibility traces: Rein-
forcement learning with imperfect value functions. In
Proceedings of the 15
th
Int. Conf. on Machine Learn-
ing, pages 278–286.
Lagoudakis, M. G. and Parr, R. (2003). Least-squares pol-
icy iteration. Journal of Machine Learning Research,
4:1107–1149.
Mahadevan, S. and Maggioni, M. (2007). Proto-value func-
tions: A laplacian framework for learning represen-
tation and control in markov decision processes. J.
Mach. Learn. Res., 8:2169–2231.
Moore, A. (1991). Variable resolution dynamic program-
ming. efficiently learning action maps in multivari-
ate real-valued state-spaces. In Machine Learning:
Proceedings of the Eight International Workshop, San
Mateo, CA., pages 333–337. Morgan-Kaufmann.
Santamaría, J., Sutton, R., and Ram, A. (1998). Experi-
ments with reinforcement learning in problems with
continuous state and action spaces. Adaptive Behav-
ior, 6:163–217.
Schaal, S. (1997). Learning from demonstration. In
Advances in Neural Information Processing Systems
(NIPS), volume 9, pages 1040–1046. MIT Press.
Schaefer, A. M., Udluft, S., and Zimmermann, H.-G.
(2007). The recurrent control neural network. In Pro-
ceedings of 15
th
European Symposium on Artificial
Neural Networks, Bruges, Belgium, 25-27 April 2007,
pages 319–324. D-Side.
Schneegaß, D., Udluft, S., and Martinetz, T. (2007). Neu-
ral rewards regression for near-optimal policy identi-
fication in markovian and partial observable environ-
ments. In Proceedings of 15
th
European Symposium
on Artificial Neural Networks, Bruges, Belgium, 25-
27 April 2007, pages 301–306. D-Side.
Singh, S. and Sutton, R. (1996). Reinforcement learning
with replacing eligibility traces. Machine Learning,
22:123–158.
Smart, W. D. and Kaelbling, L. P. (2000). Practical rein-
forcement learning in continuous spaces. In Proceed-
ings of the Seventeenth 17
th
International Conference
on Machine Learning, pages 903–910. Morgan Kauf-
mann.
Strens, M. J. and Moore, A. W. (2002). Policy search us-
ing paired comparisons. Journal of Machine Learning
Research, 3:921–950.
Sutton, R. and Barto, A. (1998). Reinforcement Learning.
MIT Press, Cambridge, MA.
Tesauro, G. (1995). Temporal difference learning and td-
gammon. Communications of the ACM, 38:58–68.
Whitehead, S. and Lin, L.-J. (1995). Reinforcement learn-
ing of non-markov decision processes. Artificial Intel-
ligence, 73:271–306.
ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics
134