REFERENCES
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-
time analysis of the multiarmed bandit problem.
Mach. Learn., 47(2-3):235–256.
Auer, P. and Ortner, R. (2006). Logarithmic online regret
bounds for undiscounted reinforcement learning. In
Sch
¨
olkopf, B., Platt, J. C., and Hoffman, T., editors,
Advances in Neural Information Processing Systems
19, Proceedings of the Twentieth Annual Conference
on Neural Information Processing Systems, Vancou-
ver, British Columbia, Canada, December 4-7, 2006,
pages 49–56. MIT Press.
Bak, S. and Tran, H.-D. (2022). Neural Network Com-
pression of ACAS Xu Early Prototype Is Unsafe:
Closed-Loop Verification Through Quantized State
Backreachability, page 280–298. Springer Interna-
tional Publishing.
Bellman, R. (1952). On the theory of dynamic program-
ming. Proc Natl Acad Sci USA, 38(8):716–719.
Bertsekas, D. (2019). Reinforcement Learning and Optimal
Control. Athena Scientific.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym. arXiv preprint arXiv:1606.01540.
Cleaveland, R., Mitsch, S., and Platzer, A. (2022). Formally
Verified Next-Generation Airborne Collision Avoid-
ance Games in ACAS X. ACM Trans. Embed. Com-
put. Syst., 22(1).
Damour, M., De Grancey, F., Gabreau, C., Gauffriau, A.,
Ginestet, J.-B., Hervieu, A., Huraux, T., Pagetti, C.,
Ponsolle, L., and Clavi
`
ere, A. (2021). Towards Certi-
fication of a Reduced Footprint ACAS-Xu System: a
Hybrid ML-based Solution. In Computer Safety, Re-
liability, and Security (40th SAFECOMP), pages 34–
48. Springer.
EU (2011). Commission regulation (eu) no 1332/2011 of 16
december 2011 laying down common airspace usage
requirements and operating procedures for airborne
collision avoidance. Official Journal of the European
Union, pages L336/20–22.
EUROCAE (2020). Ed-275: Minimal operational per-
formance for airborne collision avoidance system xu
(acas-xu) – volume i. Technical report, EUROCAE.
EUROCONTROL (2022). Airborne Collision Avoidance
System (ACAS) guide. EUROCONTROL.
FAA (2011). Introduction to TCAS II version 7.1 (February
2011) – booklet. Technical report, FAA.
Franc¸ois-Lavet, V., Henderson, P., Islam, R., Bellemare,
M. G., and Pineau, J. (2018). An introduction to deep
reinforcement learning. Foundations and Trends® in
Machine Learning, 11(3-4):219–354.
Holland and Kochenderfer, M. (2016). Dynamic logic se-
lection for unmanned aircraft separation. In 35th
IEEE/AIAA Digital Avionics Systems Conference,
DASC.
Holland, Kochenderfer, M., and Olson (2013). Optimiz-
ing the next generation collision avoidance system
for safe, suitable, and acceptable operational perfor-
mance. Air Traffic Control Quarterly, 21(3).
Howard, R. (1960). Dynamic Programming and Markov
Processes. MIT Press, Cambridge, MA.
ICAO (2014). Annex 10 to the convention on international
civil aviation – volume iv: Surveillance and collision
avoidance systems. Technical report, International
Civil Aviation Organization (ICAO).
Julian, K. D., Lopez, J., Brush, J. S., Owen, M. P.,
and Kochenderfer, M. J. (2016). Policy compres-
sion for aircraft collision avoidance systems. In
35th IEEE/AIAA Digital Avionics Systems Confer-
ence, DASC, pages 1–10. IEEE.
Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochen-
derfer, M. J. (2017). Reluplex: An efficient smt
solver for verifying deep neural networks. CoRR,
abs/1702.01135.
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Sallab, A.
A. A., Yogamani, S., and P
´
erez, P. (2022). Deep rein-
forcement learning for autonomous driving: A survey.
IEEE Transactions on Intelligent Transportation Sys-
tems, 23(6):4909–4926.
Kochenderfer, M. and Chryssanthacopoulos, J. (2011). Ro-
bust airborne collision avoidance through dynamic
programming. project report atc-371. Technical re-
port, MIT, Lincoln Lab.
Kochenderfer, M., Espindle, L., Kuchar, J., and Griffith, J.
(2008). Correlated encounter model for cooperative
aircraft in the national airspace system. project report
atc-344. Technical report, MIT, Lincoln Lab.
Kochenderfer, M., Holland, J., and Chryssanthacopoulos, J.
(2012). Next-generation airborne collision avoidance
system. Lincoln Lab Journal, 19(1):17–33.
Koutn
´
ık, J., Cuccu, G., Schmidhuber, J., and Gomez, F.
(2013). Evolving large-scale neural networks for
vision-based reinforcement learning. In Proceedings
of the 15th Annual Conference on Genetic and Evolu-
tionary Computation, GECCO ’13, page 1061–1068,
New York, NY, USA. ACM.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. Nature, 521:436–444.
Liu, H., Kiumarsi, B., Kartal, Y., Koru, A. T., Modares,
H., and Lewis, F. L. (2023). Reinforcement learning
applications in unmanned vehicle control: A compre-
hensive overview. Unmanned Syst., 11(1):17–26.
Manfredi and Jestin (2016). An introduction to ACAS-Xu
and the challenges ahead. In 35th IEEE/AIAA Digital
Avionics Systems Conference, DASC.
Meuleau, N. and Bourgine, P. (1999). Exploration of
multi-state environments: Local measures and back-
propagation of uncertainty. Mach. Learn., 35(2):117–
154.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap,
T. P., Harley, T., Silver, D., and Kavukcuoglu, K.
(2016). Asynchronous methods for deep reinforce-
ment learning. In Balcan, M. and Weinberger, K. Q.,
editors, Proceedings of the 33nd International Con-
ference on Machine Learning, ICML 2016, New York
City, NY, USA, June 19-24, 2016, volume 48, pages
1928–1937. ICLR.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
ICCAS 2024 - International Conference on Cognitive Aircraft Systems
116