S. (2019). Meta-sim: Learning to generate synthetic
datasets.
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., and
Ja
´
skowski, W. (2016). ViZDoom: A Doom-based AI
research platform for visual reinforcement learning. In
IEEE Conference on Computational Intelligence and
Games (CIG), pages 1–8.
Kingma, D. P. and Ba, J. (2014). Adam: A method for
stochastic optimization. CoRR, abs/1412.6980.
Koenig, N. P. and Howard, A. (2004). Design and use
paradigms for Gazebo, an open-source multi-robot
simulator. IEEE/RSJ International Conference on In-
telligent Robots and Systems (IROS), 3:2149–2154
vol.3.
Koltun, D. M. A. D. V. (2019). Benchmarking classic and
learned navigation in complex 3d environments. IEEE
Robotics and Automation Letters.
Kulhanek, J., Derner, E., de Bruin, T., and Babuska, R.
(2019). Vision-based navigation using deep reinforce-
ment learning. In 9th European Conference on Mobile
Robots.
Kumar, A., Buckley, T., Wang, Q., Kavelaars, A., and Ku-
zovkin, I. (2019). Offworld Gym: open-access phys-
ical robotics environment for real-world reinforce-
ment learning benchmark and research. arXiv preprint
arXiv:1910.08639.
Lample, G. and Chaplot, D. S. (2017). Playing FPS games
with deep reinforcement learning. In Thirty-First
AAAI Conference on Artificial Intelligence.
Laud, A. D. (2004). Theory and application of reward shap-
ing in reinforcement learning. Technical report.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971.
Maas, A. L. (2013). Rectifier nonlinearities improve neural
network acoustic models.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing Atari with deep reinforcement learn-
ing. arXiv preprint arXiv:1312.5602.
Mur-Artal, R. and Tard
´
os, J. D. (2017). ORB-SLAM2:
An open-source SLAM system for monocular, stereo,
and RGB-D cameras. IEEE Transactions on Robotics,
33(5):1255–1262.
OpenAI, Akkaya, I., Andrychowicz, M., Chociej, M.,
Litwin, M., McGrew, B., Petron, A., Paino, A., Plap-
pert, M., Powell, G., Ribas, R., Schneider, J., Tezak,
N., Tworek, J., Welinder, P., Weng, L., Yuan, Q.-M.,
Zaremba, W., and Zhang, L. (2019). Solving rubik’s
cube with a robot hand. ArXiv, abs/1910.07113.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Wallach, H., Larochelle, H.,
Beygelzimer, A., d'Alch
´
e-Buc, F., Fox, E., and Gar-
nett, R., editors, Advances in Neural Information Pro-
cessing Systems 32, pages 8024–8035. Curran Asso-
ciates, Inc.
Ruiz, N., Schulter, S., and Chandraker, M. (2019). Learning
to simulate. ICLR.
Rusu, A. A., Vecer
´
ık, M., Roth
¨
orl, T., Heess, N. M. O.,
Pascanu, R., and Hadsell, R. (2017). Sim-to-real robot
learning from pixels with progressive nets. In CoRL.
Sadeghi, F. and Levine, S. (2016). Cad2rl: Real single-
image flight without a single real image. arXiv
preprint arXiv:1611.04201.
Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever,
I. (2017). Evolution strategies as a scalable al-
ternative to reinforcement learning. arXiv preprint
arXiv:1703.03864.
Sampedro, C., Rodriguez-Ramos, A., Bavle, H., Carrio, A.,
de la Puente, P., and Campoy, P. (2019). A fully-
autonomous aerial robot for search and rescue appli-
cations in indoor environments using learning-based
techniques. Journal of Intelligent & Robotic Systems,
95(2):601–627.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I.,
Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M.,
Bolton, A., et al. (2017). Mastering the game of go
without human knowledge. Nature, 550(7676):354–
359.
S¸ ucan, I. A., Moll, M., and Kavraki, L. E. (2012). The Open
Motion Planning Library. IEEE Robotics & Automa-
tion Magazine, 19(4):72–82.
Sutton, R. S., McAllester, D. A., Singh, S. P., and Mansour,
Y. (1999). Policy gradient methods for reinforcement
learning with function approximation. In NIPS.
Tai, L., Paolo, G., and Liu, M. (2017). Virtual-to-real deep
reinforcement learning: Continuous control of mobile
robots for mapless navigation. In IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems
(IROS), pages 31–36.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and
Abbeel, P. (2017). Domain randomization for trans-
ferring deep neural networks from simulation to the
real world. In IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pages 23–30.
Vuong, Q., Vikram, S., Su, H., Gao, S., and Christensen,
H. I. (2019). How to pick the domain randomization
parameters for sim-to-real transfer of reinforcement
learning policies? arXiv preprint arXiv:1903.11774.
Wurm, K. M., Hornung, A., Bennewitz, M., Stachniss, C.,
and Burgard, W. (2010). Octomap: A probabilis-
tic, flexible, and compact 3d map representation for
robotic systems. In ICRA 2010 workshop on best
practice in 3D perception and modeling for mobile
manipulation, volume 2.
Xie, L., Wang, S., Markham, A., and Trigoni, N. (2017).
Towards monocular vision based obstacle avoidance
through deep reinforcement learning. arXiv preprint
arXiv:1706.09829.
Zamora, I., Lopez, N. G., Vilches, V. M., and Cordero, A. H.
(2016). Extending the OpenAI Gym for robotics:
a toolkit for reinforcement learning using ROS and
Gazebo. arXiv preprint arXiv:1608.05742.
Zaremba, W. and Sutskever, I. (2014). Learning to execute.
ArXiv, abs/1410.4615.
Sim-to-Real Transfer with Incremental Environment Complexity for Reinforcement Learning of Depth-based Robot Navigation
323