Dulac-Arnold, G., et al. (2017). Deep q-learning from
demonstrations. arXiv preprint arXiv:1704.03732.
Ho, J. and Ermon, S. (2016). Generative adversarial imi-
tation learning. In Advances in Neural Information
Processing Systems, pages 4565–4573.
Kang, B., Jie, Z., and Feng, J. (2018). Policy optimiza-
tion with demonstrations. In Proceedings of the 35th
International Conference on Machine Learning, vo-
lume 80, pages 2469–2478, Stockholm, Sweden.
Kendall, A., Hawke, J., Janz, D., Mazur, P., Reda, D.,
Allen, J.-M., Lam, V.-D., Bewley, A., and Shah, A.
(2018). Learning to drive in a day. arXiv preprint
arXiv:1807.00412.
Kitani, K. M., Ziebart, B. D., Bagnell, J. A., and Hebert,
M. (2012). Activity forecasting. In Fitzgibbon, A.,
Lazebnik, S., Perona, P., Sato, Y., and Schmid, C.,
editors, Computer Vision – ECCV 2012, pages 201–
214, Berlin, Heidelberg. Springer Berlin Heidelberg.
Koenig, N. and Howard, A. (2004). Design and use para-
digms for gazebo, an open-source multi-robot simula-
tor. In In IEEE/RSJ International Conference on In-
telligent Robots and Systems, pages 2149–2154.
Kuderer, M., Gulati, S., and Burgard, W. (2015). Learning
driving styles for autonomous vehicles from demon-
stration. In Robotics and Automation (ICRA), 2015
IEEE International Conference on, pages 2641–2646.
IEEE.
Le, H. M., Jiang, N., Agarwal, A., Dud
´
ık, M., Yue, Y., and
Daum
´
e III, H. (2018). Hierarchical imitation and rein-
forcement learning. arXiv preprint arXiv:1803.00590.
Lee, N., Choi, W., Vernaza, P., Choy, C. B., Torr, P. H.,
and Chandraker, M. (2017). Desire: Distant future
prediction in dynamic scenes with interacting agents.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 336–345.
Levine, S., Finn, C., Darrell, T., and Abbeel, P. (2016). End-
to-end training of deep visuomotor policies. The Jour-
nal of Machine Learning Research, 17(1):1334–1373.
Liaw, R., Krishnan, S., Garg, A., Crankshaw, D., Gon-
zalez, J. E., and Goldberg, K. (2017). Composing
meta-policies for autonomous driving using hierar-
chical deep reinforcement learning. arXiv preprint
arXiv:1711.01503.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Conti-
nuous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971.
Liu, G.-H., Siravuru, A., Prabhakar, S., Veloso, M., and
Kantor, G. (2017). Learning end-to-end multimodal
sensor policies for autonomous navigation. In Levine,
S., Vanhoucke, V., and Goldberg, K., editors, Procee-
dings of the 1st Annual Conference on Robot Lear-
ning, volume 78 of Proceedings of Machine Learning
Research, pages 249–261. PMLR.
Mania, H., Guy, A., and Recht, B. (2018). Simple random
search provides a competitive approach to reinforce-
ment learning. arXiv preprint arXiv:1803.07055.
Mannion, P., Duggan, J., and Howley, E. (2016a). An expe-
rimental review of reinforcement learning algorithms
for adaptive traffic signal control. In McCluskey, L. T.,
Kotsialos, A., M
¨
uller, P. J., Kl
¨
ugl, F., Rana, O., and
Schumann, R., editors, Autonomic Road Transport
Support Systems, pages 47–66. Springer International
Publishing.
Mannion, P., Mason, K., Devlin, S., Duggan, J., and Ho-
wley, E. (2016b). Multi-objective dynamic dispa-
tch optimisation using multi-agent reinforcement lear-
ning. In Proceedings of the 15th International Confe-
rence on Autonomous Agents and Multiagent Systems
(AAMAS), pages 1345–1346.
Mason, K., Mannion, P., Duggan, J., and Howley, E. (2016).
Applying multi-agent reinforcement learning to wa-
tershed management. In Proceedings of the Adaptive
and Learning Agents workshop (at AAMAS 2016).
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T.,
Harley, T., Silver, D., and Kavukcuoglu, K. (2016).
Asynchronous methods for deep reinforcement lear-
ning. In International Conference on Machine Lear-
ning, pages 1928–1937.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Anto-
noglou, I., Wierstra, D., and Riedmiller, M. (2013).
Playing atari with deep reinforcement learning. arXiv
preprint arXiv:1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller,
M., Fidjeland, A. K., Ostrovski, G., et al. (2015).
Human-level control through deep reinforcement le-
arning. Nature, 518(7540):529.
Naik, D. K. and Mammone, R. (1992). Meta-neural net-
works that learn by learning. In Neural Networks,
1992. IJCNN., International Joint Conference on, vo-
lume 1, pages 437–442. IEEE.
Ng, A. Y., Harada, D., and Russell, S. (1999). Policy invari-
ance under reward transformations: Theory and appli-
cation to reward shaping. In ICML, volume 99, pages
278–287.
Ngai, D. C. K. and Yung, N. H. C. (2011). A multiple-
goal reinforcement learning method for complex vehi-
cle overtaking maneuvers. IEEE Transactions on In-
telligent Transportation Systems, 12(2):509–522.
Nichol, A., Achiam, J., and Schulman, J. (2018).
On first-order meta-learning algorithms. CoRR,
abs/1803.02999.
Nosrati, M. S., Abolfathi, E. A., Elmahgiubi, M., Yadmel-
lat, P., Luo, J., Zhang, Y., Yao, H., Zhang, H., and Ja-
mil, A. (2018). Towards practical hierarchical reinfor-
cement learning for multi-lane autonomous driving.
Paden, B., C
´
ap, M., Yong, S. Z., Yershov, D. S., and Fraz-
zoli, E. (2016). A survey of motion planning and con-
trol techniques for self-driving urban vehicles. CoRR,
abs/1604.07446.
Pan, X., You, Y., Wang, Z., and Lu, C. (2017). Virtual to
real reinforcement learning for autonomous driving.
arXiv preprint arXiv:1704.03952.
Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017).
Curiosity-driven exploration by self-supervised pre-
diction. In International Conference on Machine Le-
arning (ICML), volume 2017.
Peng, X. B., Andrychowicz, M., Zaremba, W., and Ab-
beel, P. (2017). Sim-to-real transfer of robotic con-
trol with dynamics randomization. arXiv preprint
arXiv:1710.06537.
Exploring Applications of Deep Reinforcement Learning for Real-world Autonomous Driving Systems
571