Duan, Y., Chen, X., Houthooft, R., Schulman, J., and
Abbeel, P. (2016). Benchmarking Deep Reinforce-
ment Learning for Continuous Control. In Proceed-
ings of The 33rd International Conference on Ma-
chine Learning, volume 48 of Proceedings of Machine
Learning Research, pages 1329–1338.
Feinberg, E. and Shwartz, A. (2002). Handbook of
Markov Decision Processes: Methods and Applica-
tions. Springer US.
Gazis, D. C., Herman, R., and Rothery, R. W. (1961). Non-
linear Follow-the-Leader Models of Traffic Flow. Op-
erations Research, 9(4):545–567.
Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup,
D., and Meger, D. (2018). Deep Reinforcement Learn-
ing that Matters. arXiv: 1709.06560.
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog,
A., Jang, E., Quillen, D., Holly, E., Kalakrishnan,
M., Vanhoucke, V., and Levine, S. (2018). Scal-
able Deep Reinforcement Learning for Vision-Based
Robotic Manipulation. In Proceedings of The 2nd
Conf. on Robot Learning, vol. 87 of Proceedings of
Machine Learning Research, pages 651–673.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Of-
fline Reinforcement Learning: Tutorial, Review, and
Perspectives on Open Problems. arXiv: 2005.01643.
Lighthill, M. J. and Whitham, G. B. (1955). On kinematic
waves. II. A theory of traffic flow on long crowded
roads. Proc. R. Soc. Lond. A, 229:317–345.
Lopez, P. A., Behrisch, M., Bieker-Walz, L., Erdmann, J.,
Fl
¨
otter
¨
od, Y.-P., Hilbrich, R., L
¨
ucken, L., Rummel,
J., Wagner, P., and Wießner, E. (2018). Microscopic
Traffic Simulation using SUMO. In The 21st IEEE
Intl. Conf. on Intelligent Transportation Systems.
Lu, X.-Y., Varaiya, P., Horowitz, R., Su, D., and Shladover,
S. E. (2011). Novel Freeway Traffic Control with
Variable Speed Limit and Coordinated Ramp Meter-
ing. Transportation Research Record: Journal of the
Transportation Research Board, 2229(1):55–65.
Mania, H., Guy, A., and Recht, B. (2018). Simple random
search provides a competitive approach to reinforce-
ment learning. arXiv: 1803.07055.
McNeil, D. R. (1968). A Solution to the Fixed-Cycle Traffic
Light Problem for Compound Poisson Arrivals. Jour-
nal of Applied Probability, 5(3):624–635.
Murphy, K. P. (2012). Machine Learning: A Probabilistic
Perspective. The MIT Press.
OpenStreetMap (2020). https://www.openstreetmap.org.
Accessed Dec.18, 2020.
Orosz, G. (2016). Connected cruise control: modelling, de-
lay effects, and nonlinear behaviour. Vehicle System
Dynamics, 54(8):1147–1176.
Orosz, G., Wilson, R. E., and St
´
ep
´
an, G. (2010). Traffic
jams: dynamics and control. Philosophical Transac-
tions of the Royal Society A: Mathematical, Physical
and Engineering Sciences, 368(1928):4455–4479.
Puterman, M. L. (1994). Markov Decision Processes: Dis-
crete Stochastic Dynamic Programming. John Wiley
& Sons, Inc., 1st edition.
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schul-
man, J., Todorov, E., and Levine, S. (2018). Learn-
ing Complex Dexterous Manipulation with Deep Re-
inforcement Learning and Demonstrations. In Pro-
ceedings of Robotics: Science and Systems (RSS).
Recht, B. (2019). A Tour of Reinforcement Learning:
The View from Continuous Control. Annual Re-
view of Control, Robotics, and Autonomous Systems,
2(1):253–279.
Schulman, J., Levine, S., Moritz, P., Jordan, M., and
Abbeel, P. (2015). Trust Region Policy Optimization.
In Proceedings of the 32nd International Conf. on In-
ternational Conference on Machine Learning, vol. 37
of ICML’15, pages 1889–1897.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,
M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
Graepel, T., Lillicrap, T., Simonyan, K., and Hass-
abis, D. (2018). A general reinforcement learning
algorithm that masters chess, shogi, and Go through
self-play. Science, 362(6419):1140–1144.
Stern, R. E., Cui, S., Delle Monache, M. L., Bhadani, R.,
Bunting, M., Churchill, M., Hamilton, N., Haulcy,
R., Pohlmann, H., Wu, F., Piccoli, B., Seibold, B.,
Sprinkle, J., and Work, D. B. (2018). Dissipation
of stop-and-go waves via control of autonomous ve-
hicles: Field experiments. Transportation Research
Part C: Emerging Technologies, 89:205 – 221.
Sugiyama, Y., Fukui, M., Kikuchi, M., Hasebe, K.,
Nakayama, A., Nishinari, K., Tadaki, S.-i., and
Yukawa, S. (2008). Traffic jams without bottle-
necks—experimental evidence for the physical mech-
anism of the formation of a jam. New Journal of
Physics, 10(3):033001.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. Adaptive Computation and Ma-
chine Learning. MIT Press, 2nd edition.
Treiber, M. and Kesting, A. (2013). Traffic Flow Dynamics.
Springer-Verlag Berlin Heidelberg.
Vinitsky, E., Kreidieh, A., Flem, L. L., Kheterpal, N., Jang,
K., Wu, C., Wu, F., Liaw, R., Liang, E., and Bayen,
A. M. (2018). Benchmarks for reinforcement learning
in mixed-autonomy traffic. In Proceedings of The 2nd
Conf. on Robot Learning, vol. 87 of Proceedings of
Machine Learning Research, pages 399–409.
Wang, J., Zheng, Y., Xu, Q., Wang, J., and Li, K.
(2020). Controllability Analysis and Optimal Control
of Mixed Traffic Flow With Human-Driven and Au-
tonomous Vehicles. IEEE Transactions on Intelligent
Transportation Systems, pages 1–15.
Wiering, M. (2000). Multi-Agent Reinforcement Leraning
for Traffic Light Control. In Proceedings of the Sev-
enteenth Intl. Conf. on Machine Learning, ICML ’00,
pages 1151–1158.
Williams, R. J. (1992). Simple statistical gradient-following
algorithms for connectionist reinforcement learning.
Machine Learning, 8(3):229–256.
Wu, C., Kreidieh, A., Parvate, K., Vinitsky, E., and Bayen,
A. M. (2017). Flow: Architecture and Benchmarking
for Reinforcement Learning in Traffic Control. arXiv:
1710.05465.
Zheng, Y., Wang, J., and Li, K. (2020). Smoothing Traf-
fic Flow via Control of Autonomous Vehicles. IEEE
Internet of Things Journal, 7(5):3882–3896.
A Reinforcement Learning Approach for Traffic Control
141