Kingma, D. and Ba, J. (2014). Adam: A method
for stochastic optimization. arXiv preprint
arXiv:1412.6980.
Krishnamurthy, A., EDU, C., Daum
´
e III, H., and EDU, U.
(2015). Learning to search better than your teacher.
arXiv preprint arXiv:1502.02206.
Langley, P. (2000). Crafting papers on machine learn-
ing. In Langley, P., editor, Proceedings of the
17th International Conference on Machine Learning
(ICML 2000), pages 1207–1216, Stanford, CA. Mor-
gan Kaufmann.
Lin, L.-J. (1992). Self-improving reactive agents based on
reinforcement learning, planning and teaching. Ma-
chine learning, 8(3-4):293–321.
Lin, L.-J. (1993). Reinforcement learning for robots using
neural networks. PhD thesis, Fujitsu Laboratories Ltd.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
Harley, T., Silver, D., and Kavukcuoglu, K. (2016).
Asynchronous methods for deep reinforcement learn-
ing. In International Conference on Machine Learn-
ing, pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312.5602.
Peters, J., Vijayakumar, S., and Schaal, S. (2003). Rein-
forcement learning for humanoid robotics. In Pro-
ceedings of the third IEEE-RAS international confer-
ence on humanoid robots, pages 1–20.
Pritzel, A., Uria, B., Srinivasan, S., Badia, A. P., Vinyals,
O., Hassabis, D., Wierstra, D., and Blundell, C.
(2017). Neural episodic control. In International Con-
ference on Machine Learning, pages 2827–2836.
Riedmiller, M. (2005). Neural fitted q iteration–first ex-
periences with a data efficient neural reinforcement
learning method. In European Conference on Ma-
chine Learning, pages 317–328. Springer.
Ross, S. and Bagnell, J. A. (2014). Reinforcement and
imitation learning via interactive no-regret learning.
arXiv preprint arXiv:1406.5979.
Rummery, G. A. and Niranjan, M. (1994). On-line Q-
learning using connectionist systems. University of
Cambridge, Department of Engineering.
Salimans, T., Ho, J., Chen, X., and Sutskever, I. (2017).
Evolution strategies as a scalable alternative to rein-
forcement learning. arXiv preprint arXiv:1703.03864.
Schaul, T., Quan, J., Antonoglou, I., and Silver, D.
(2015). Prioritized experience replay. arXiv preprint
arXiv:1511.05952.
Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., and
Moritz, P. (2015). Trust region policy optimization.
In ICML, pages 1889–1897.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
Van Den Driessche, G., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., et al. (2016).
Mastering the game of go with deep neural networks
and tree search. Nature, 529(7587):484–489.
Sutton, R. S. (1988). Learning to predict by the methods of
temporal differences. Machine learning, 3(1):9–44.
Sutton, R. S., McAllester, D. A., Singh, S. P., Mansour, Y.,
et al. (1999). Policy gradient methods for reinforce-
ment learning with function approximation. In NIPS,
volume 99, pages 1057–1063.
Van Hasselt, H., Guez, A., and Silver, D. (2016). Deep rein-
forcement learning with double q-learning. In AAAI,
pages 2094–2100.
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanc-
tot, M., and de Freitas, N. (2015). Dueling network
architectures for deep reinforcement learning. arXiv
preprint arXiv:1511.06581.
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3-4):279–292.
Werbos, P. J. (1992). Approximate dynamic programming
for real-time control and neural modeling. Handbook
of intelligent control.
Williams, R. J. (1987). A class of gradient-estimating algo-
rithms for reinforcement learning in neural networks.
In Proceedings of the IEEE First International Con-
ference on Neural Networks, volume 2, pages 601–
608.
ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods
82