comparison; and 3) robust to different kinds of time-
varying traffic flows.
4 CONCLUSIONS
In this paper, we proposed co-operative traffic signal
control using Multiagent Twin Delayed Deep
Deterministic Policy Gradients (MATD3). Our
method establishes traffic signal duration control in a
road network where the traffic lights at each
intersection learn cooperatively to reduce traffic
congestion. We have shown that our method
outperforms existing reinforcement learning traffic
control strategies like Multiagent Deep Deterministic
Policy Gradients (MADDPG), Independent Deep Q
Networks (IDQN) and the traditional traffic signal
control method Fixed Duration control (FD) across
different time varying traffic flows. More
experiments over larger networks are being
conducted using our algorithm with comparisons on
different kinds of traffic flows. We are also studying
the scalability of our solution to larger networks.
ACKNOWLEDGEMENTS
The work of the second author was supported through
the J. C. Bose Fellowship, a grant from the
Department of Science and Technology, Government
of India, and the Robert Bosch Centre for Cyber
Physical Systems, Indian Institute of Science,
Bangalore.
REFERENCES
Abdulhai, B., Pringle, R., & Grigoris J., K. (2003).
Reinforcement learning for true adaptive traffic signal
control. Journal of Transportation Engineering, 278-
285.
Ackermann, J., Gabler, V., Osa, T., & Sugiyama, M.
(2019). Reducing overestimation bias in multi-agent
domains using double centralized critics. arXiv preprint
arXiv:1910.01465. arXiv .
Bellman, R. (1957). A Markovian Decision Process.
Journal of Mathematics and Mechanics, 679-684.
Bertsekas, D. (2012). Dynamic Programming and Optimal
Control. Athena Scientific, Boston.
Ge , H. (2019). Cooperative deep Q-learning with Q-value
transfer for multi-intersection signal control. IEEE
Access 7, 40797-40809.
Genders, W., & Razavi, S. (2016). Using a deep
reinforcement learning agent for traffic signal control.
arXiv preprint arXiv:1611.01142.
Head, K., Mirchandanai, P., & Shelby, S. (1998). The
RHODES prototype: a description and some results.
USA: Transportation Research Board.
Kingma, D., & Ba, J. (2014). Adam: A method for
stochastic optimization. arXiv preprint
arXiv:1412.6980.
Li, L., Lv, Y., & Wang, F. (2016). Traffic signal timing via
deep reinforcement learning. Traffic signal timing via
deep reinforcement learning, IEEE/CAA Journal of
Automatica Sinica.
Liang, X., Du, X., Wang, G., & Han, Z. (2018). Deep
Reinforcement Learning for Traffic Light Control in
Vehicular Networks. ArXiv abs/1803.11115.
Lowe, R., Wu, Y., Tamar, A., & Abbeel, P. &. (2017).
Multi-agent actor-critic for mixed cooperative-
competitive environments. arXiv preprint
arXiv:1706.02275.
Lowrie, P. (1990). Scats, sydney co-ordinated adaptive
traffic system: A traffic responsive method of
controlling urban traffic.
Mauro, V., & Taranto, C. (1990). UTOPIA. IFAC
Proceedings Volumes, 23(2), 245-252.
Prabhuchandran, K., Hemanth Kumar, A., & Bhatnagar, S.
(2014). Multi-agent reinforcement learning for traffic
signal control. 17th Interna-tional IEEE Conference on
Intelligent Transportation Systems (ITSC) (pp. 2539-
2534). Qingdao: IEEE.
Prashant, L., & Bhatnagar, S. (2012). Threshold tuning
using stochastic optimization for graded signal control.
IEEE Transactions on Vehicular Technology, 3865-
3880.
Prashanth, L., & Bhatnagar, S. (2011). Reinforcement
Learning with Function Approximation for Traffic
Signal Control. IEEE Transactions on Intelligent
Transportation Systems, 12(2):412-421.
Puterman, M. (1994). Markov Decision Processes. Wiley,
New York.
Riccardo, R., & Massimiliano, G. (2012). An empirical
analysis of vehicle time headways on rural two-lane
two-way roads. Procedia-Social and Behavioral
Sciences, 865-874.
T Chu, T. (2019). Multi-agent deep reinforcement learning
forlarge-scale traffic signal control.
IEEE Transactions
on Intelligent Transportation Systems.
Tampuu, A., Matiisen, T., & Vicente, R. (2017). Multiagent
cooperation and competition with deep reinforcement
learning. PloS one.
Watkins, C., & Dayan, P. (1992). Q Learning. Machine
Learning, 279-292.