5 CONCLUSION
In this paper we presented a decision-making mod-
ule able to control autonomous vehicles during round-
about insertions. The system was trained inside a
synthetic representation of a real roundabout with a
novel implementation of A3C which we called De-
layed A3C; this representation was chosen so that
it can be easily reproduced with both simulated and
real data. The developed module permits to execute
the maneuver interpreting the intention of the other
drivers and implicitly negotiating with them, since
their simulated behavior was trained in a cooperative
multi-agent fashion.
We proved that D-A3C is able to achieve better learn-
ing performances compared to A3C and A2C by in-
creasing the exploration in the agent policies; more-
over, we demonstrated that negotiation and interac-
tion capabilities are essential in this scenario since a
rule-based approach leads to superfluous waits.
It also emerged that the decision-making module fea-
tures light generalization capabilities both for unseen
scenarios and for real data, tested by introducing noise
in the obstacles perception and in the trajectory of
agents. However, these capabilities should be en-
forced in future works for making the system usable
both in real-world and unseen scenarios.
Finally, we tested our module on real video sequences
to compare the output of our module with the actions
of 10 users and we observed that our system has a
good match with human decisions.
REFERENCES
Bacchiani, G., Molinari, D., and Patander, M. (2019).
Microscopic traffic simulation by cooperative multi-
agent deep reinforcement learning. Proceedings of the
18th International Conference on Autonomous Agents
and Multiagent Systems, AAMAS ’19, Montreal, QC,
Canada, 2019.
Bandyopadhyay, T. et al. (2012). Intention-aware motion
planning. WAFR.
Bansal, M., Krizhevsky, A., and Ogale, A. S. (2018). Chauf-
feurnet: Learning to drive by imitating the best and
synthesizing the worst. abs/1812.03079.
Behrisch, M., Bieker-Walz, L., Erdmann, J., and Kra-
jzewicz, D. (2011). Sumo – simulation of urban mo-
bility: An overview. Proceedings of SIMUL, 2011.
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.
(2009). Curriculum learning. Journal of the Ameri-
can Podiatry Association, 60:6.
Boehm, W. and M
¨
uller, A. (1999). On de casteljau’s algo-
rithm. Computer Aided Geometric Design.
Codevilla, F. et al. (2017). End-to-end driving via condi-
tional imitation learning.
Dosovitskiy, A. et al. (2017). Carla: An open urban driving
simulator. CoRL.
Franc¸ois-Lavet, V. et al. (2018). An introduction to deep
reinforcement learning. Foundations and Trends
R
in
Machine Learning, 11(3-4):219–354.
Goodfellow, I., Bengio, Y., and Courville, A.
(2016). Deep Learning. MIT Press.
http://www.deeplearningbook.org.
Gupta, J., Egorov, M., and Kochenderfer, M. (2017). Coop-
erative multi-agent control using deep reinforcement
learning. pages 66–83.
Hatipoglu, C., Ozguner, U., and Sommerville, M. (1996).
Longitudinal headway control of autonomous vehi-
cles. Proceeding of the 1996 IEEE International Con-
ference on Control Applications, pages 721–726.
Hoel, C., Wolff, K., and Laine, L. (2018). Automated
speed and lane change decision making using deep re-
inforcement learning. 2018 21st International Confer-
ence on Intelligent Transportation Systems (ITSC).
Isele, D. et al. (2018). Navigating occluded intersections
with autonomous vehicles using deep reinforcement
learning. 2018 IEEE International Conference on
Robotics and Automation (ICRA), pages 2034–2039.
Liu, W., Kim, S., Pendleton, S., and Ang, M. H. (2015).
Situation-aware decision making for autonomous
driving on urban road using online POMDP. 2015
IEEE Intelligent Vehicles Symposium (IV).
Mnih et al. (2015). Human-level control through deep rein-
forcement learning. Nature, 518:529–33.
Mnih, V. et al. (2016). Asynchronous methods for deep
reinforcement learning. Proceedings of the 33nd In-
ternational Conference on Machine Learning, ICML
2016, New York City, NY, USA, June 19-24, 2016.
Ong, S. C. W., Png, S. W., Hsu, D., and Lee, W. S.
(2010). Planning under uncertainty for robotic tasks
with mixed observability. The International Journal
of Robotics Research, 29(8):1053–1068.
Packard, K. and Worth, C. (2003-2019). Cairo graphics li-
brary.
Richter, S. R., Vineet, V., Roth, S., and Koltun, V. (2016).
Playing for data: Ground truth from computer games.
abs/1608.02192.
Shalev-Shwartz, S., Shammah, S., and Shashua, A. (2016).
Safe, multi agent, reinforcement learning for au-
tonomous driving. abs/1610.03295.
Song, W., Xiong, G., and Chen, H. (2016). Intention-
aware autonomous driving decision-making in an un-
controlled intersection. Mathematical Problems in
Engineering, 2016:1–15.
Spaan, M. T. J. (2012). Partially observable markov deci-
sion processes. Reinforcement Learning: State-of-the-
Art, pages 387–414.
Sutton, R. S. and Barto, A. G. (2018). Introduction to Rein-
forcement Learning. MIT Press.
van der Horst, A. and Hogema, J. H. (1994). Time-to-
collision and collision avoidance systems. ICTCT
Workshop Salzburg.
Wu, Y. et al. (2017). Openai baselines: ACKTR & A2C.
Intelligent Roundabout Insertion using Deep Reinforcement Learning
385