Banerjee, C., Nguyen, K., Fookes, C., and Raissi, M.
(2023). A survey on physics informed reinforcement
learning: Review and open problems.
Bellman, R. E. (1957). Dynamic Programming. Princeton
University Press, Princeton, NJ, USA, 1 edition.
Bertsekas, D. (2019). Reinforcement Learning and Optimal
Control. Athena Scientific optimization and computa-
tion series. Athena Scientific.
de Boor, C. (1978). A practical guide to splines. In Applied
Mathematical Sciences.
De Marchi, A., Dreves, A., Gerdts, M., Gottschalk, S., and
Rogovs, S. (2022). A function approximation ap-
proach for parametric optimization. Journal of Op-
timization Theory and Applications. in press.
Deuflhard, P. (2011). Newton Methods for Nonlinear Prob-
lems: Affine Invariance and Adaptive Algorithms.
Springer Publishing Company, Incorporated.
Feinberg, E. A. and Shwartz, A. (2002). Handbook of
Markov Decision Processes: Methods and Applica-
tions. International Series in Operations Research.
Springer US.
Feng, S., Sebastian, B., and Ben-Tzvi, P. (2021). A colli-
sion avoidance method based on deep reinforcement
learning. Robotics, 10(2).
Fischer, A. (1992). A special newton-type optimization
method. Optimization, 24:269–284.
Gerdts, M. (2024). Optimal Control of ODEs and DAEs.
De Gruyter Oldenbourg, Berlin, Boston, 2 edition.
Gottschalk, S. (2021). Differential Equation Based Frame-
work for Deep Reinforcement Learning. Dissertation.
Fraunhofer Verlag.
Gr
¨
une, L. and Junge, O. (2008). Gew
¨
ohnliche Differential-
gleichungen. Springer Studium Mathematik - Bache-
lor. Springer Spektrum Wiesbaden, 2 edition.
Ito, K. and Kunisch, K. (2009). On a semi-smooth newton
method and its globalization. Mathematical Program-
ming, (118):347–370.
Karush, W. (1939). Minima of Functions of Several Vari-
ables with Inequalities as Side Conditions. Dis-
sertation. Department of Mathematics, University of
Chicago, Chicago, IL, USA.
Kingma, D. and Ba, J. (2014). Adam: A method for
stochastic optimization. International Conference on
Learning Representations.
Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear pro-
gramming. In Proceedings of the Second Berkeley
Symposium on Mathematical Statistics and Probabil-
ity, pages 481–492, Berkeley, CA, USA. University of
California Press.
Landgraf, D., V
¨
olz, A., Kontes, G., Mutschler, C., and
Graichen, K. (2022). Hierarchical learning for model
predictive collision avoidance. IFAC-PapersOnLine,
55(20):355–360. 10th Vienna International Confer-
ence on Mathematical Modelling MATHMOD 2022.
Liniger, A., Domahidi, A., and Morari, M. (2015).
Optimization-based autonomous racing of 1:43 scale
rc cars. Optimal Control Applications and Methods,
36:628–647.
Lot, R. and Biral, F. (2014). A curvilinear abscissa approach
for the lap time optimization of racing vehicles. IFAC
Proceedings Volumes, 47(3):7559–7565. 19th IFAC
World Congress.
Moerland, T. M., Broekens, J., Plaat, A., and Jonker, C. M.
(2020). Model-based reinforcement learning: A sur-
vey.
Pagot, E., Piccinini, M., and Biral, F. (2020). Real-time op-
timal control of an autonomous rc car with minimum-
time maneuvers and a novel kineto-dynamical model.
IEEE International Conference on Intelligent Robots
and Systems, pages 2390–2396.
Pateria, S., Subagdja, B., Tan, A., and Quek, C. (2021). Hi-
erarchical reinforcement learning: A comprehensive
survey. ACM Comput. Surv., 54(5).
Ramesh, A. and Ravindran, B. (2023). Physics-informed
model-based reinforcement learning.
Reid, M. D. and Ryan, M. R. K. (2000). Using ilp to im-
prove planning in hierarchical reinforcement learning.
In ILP.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning Representations by Back-propagating Er-
rors. Nature, 323:533–536.
Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., and
Moritz, P. (2015). Trust region policy optimization.
In ICML, volume 37 of JMLR Workshop and Confer-
ence Proceedings, pages 1889–1897. JMLR.org.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D.,
and Riedmiller, M. (2014). Deterministic policy gra-
dient algorithms. In Xing, E. P. and Jebara, T., edi-
tors, Proceedings of the 31st International Conference
on Machine Learning, volume 32 of Proceedings of
Machine Learning Research, pages 387–395, Bejing,
China. PMLR.
Sussmann, H. and Willems, J. (1997). 300 years of optimal
control: from the brachystochrone to the maximum
principle. IEEE Control Systems Magazine, 17(3):32–
44.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. The MIT Press, second edition.
Williams, R. J. (1992). Simple statistical gradient-following
algorithms for connectionist reinforcement learning.
Machine Learning, 8:229–256.
Wischnewski, A., Herrmann, T., Werner, F., and Lohmann,
B. (2023). A tube-mpc approach to autonomous multi-
vehicle racing on high-speed ovals. IEEE Transac-
tions on Intelligent Vehicles, 8(1):368–378.
Reinforcement Learning and Optimal Control: A Hybrid Collision Avoidance Approach
87