way to the goal location. Each test required approxi-
mately 15m of robot motion through the maze.
5 SUMMARY AND FUTURE
WORK
Evaluation of end-to-end RL on different environment
types with non-holonomic vehicles showed the ad-
vantage of training on more complex environments
(partial braid and perfect mazes) in terms of both
probability of success and the length of the found path
to the goal. Validation with real hardware demon-
strated that the assumptions made in terms of the E2E-
RL formulation were realistic.
ACKNOWLEDGEMENTS
This work was supported by the Natural Sciences and
Engineering Research Council (NSERC) through the
NSERC Canadian Robotics Network (NCRN).
REFERENCES
Devo, A., Costante, G., and Valigi, P. (2020). Deep
reinforcement learning for instruciotn following vi-
sual navigation in 3d maze-link environments. IEEE
Robotics and Autonomation Letters, January.
Dudek, G. and Jenkin, M. (2000). Computational Princi-
ples of Mobile Robotics. Cambridge University Press,
Cambridge, UK.
Faust, A., Ramirez, O., Fiser, M., Oslund, K., Francis,
A., Davidson, J., and Tapia, L. (2018). PRM-RL:
Long-range robotic navigation tasks by combining re-
inforcement learning and sampling-based planning. In
IEEE International Conference on Robotics and Au-
tomation (ICRA), Brisbane, Australia.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. Inter-
national Conference on Machine Learning (ICML).
Hill, A., Raffin, A., Ernestus, M., Gleave, A., A, K.,
Traore, R., Dhariwal, P., Hesse, C., Klimov, O.,
Nichol, A., Plappert, M., Radford, A., Schulman,
J., Sidor, S., and Wu, Y. (2018). Stable Baselines.
https://github.com/hill-a/stable-baselines.
Karaman, S. and Frazzoli, E. (2011). Sampling-based algo-
rithms for optimal motion planning. The International
Journal of Robotics Research, 30:846–894.
Lavalle, S. M. (1998). Rapidly-exploring random trees: A
new tool for path planning. Computer Science Depart-
ment, Iowa State University.
Matthews, W. H. (1927). Mazes and Labyrinths: Their
History and Development. Dover. Dover Publication
Reprint, 1970.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap,
T. P., Harley, T., Silver, D., and Kavukcuoglu, K.
(2016). Asynchronous Methods for Deep Reinforce-
ment Learning. International Conference on Machine
Learning (ICML).
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing Atari with deep reinforcement learn-
ing. CoRR.
NovAtel (2020). Real-time kinematic (rtk).
https://novatel.com/an-introduction-to-gnss/chapter-
5-resolving-errors/real-time-kinematic-rtk. Accessed:
2020-07-30.
Passaro, V. M. . N., Cuccovillo, A., Vaiani, L., Carlo, M.,
and Campanella, C. E. (2017). Gyroscope technology
and applications: A review in the industrial perspec-
tive. Sensors, 17(10):2284.
Phidgets (2020). Phidgets spatial user guide.
https://www.phidgets.com/. (accessed August
12, 2020).
Pullen, W. D. (2020). Maze classification.
https://www.astrolog.org/labyrnth/algrithm.htm.
(accessed August 12, 2020).
Schulman, J., Moritz, P., Levine, S., Jordan, M., and
Abbeel, P. (2015). High-dimensional continuous con-
trol using generalized advantage estimation. CoRR.
Silver, D., Huang, A., Maddison., C., Guez, A., Sifre, L.,
van den Driessche, G., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., Dieleman, S.,
Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I.,
Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel,
T., and Hassabis, D. (2016). Mastering the game of
Go with deep neural networks and tree search. Na-
ture, 529(7587):484–489.
Sun, Y., Liu, M., and Meng, M. Q. (2014). Wifi signal
strength-based robot indoor localization. In IEEE In-
ternational Conference on Information and Automa-
tion (ICIA), pages 250–256, Hailar, China.
Tai, L., Paolo, G., and Liu, M. (2017). Virtual-to-real deep
reinforcement learning: Continuous control of mobile
robots for mapless navigation. In IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems
(IROS), Vancouver, Canada.
Turidus (2017). Python-Maze.
https://github.com/Turidus/Python-Maze.
Wanmg, J., Elfwing, S., and Uchibe, E. (2021). Modular
deep reinforcement learning from reward and punish-
ment for robot navigation. Neural Networks, 135:115–
126.
ICINCO 2022 - 19th International Conference on Informatics in Control, Automation and Robotics
376