
reinforcement learning with temporal logics. In FOR-
MATS, volume 12288 of Lecture Notes in Computer Sci-
ence, pages 1–22. Springer.
Hensel, C., Junges, S., Katoen, J., Quatmann, T., and Volk,
M. (2022). The probabilistic model checker Storm. Int.
J. Softw. Tools Technol. Transf., 24(4):589–610.
Jin, P., Wang, Y., and Zhang, M. (2022). Efficient LTL
model checking of deep reinforcement learning systems
using policy extraction. In SEKE, pages 357–362. KSI
Research Inc.
Kazak, Y., Barrett, C. W., Katz, G., and Schapira, M.
(2019). Verifying deep-rl-driven systems. In Ne-
tAI@SIGCOMM, pages 83–89. ACM.
Kwiatkowska, M. Z., Norman, G., and Parker, D. (2011).
PRISM 4.0: Verification of probabilistic real-time sys-
tems. In CAV, volume 6806 of LNCS, pages 585–591.
Springer.
Littman, M. L., Topcu, U., Fu, J., Isbell, C., Wen, M., and
MacGlashan, J. (2017). Environment-independent task
specifications via GLTL. CoRR, abs/1704.04341.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013a). Playing atari with deep reinforcement learning.
CoRR, abs/1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M. A.
(2013b). Playing atari with deep reinforcement learning.
CoRR, abs/1312.5602.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller, M. A.,
Fidjeland, A., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wier-
stra, D., Legg, S., and Hassabis, D. (2015). Human-
level control through deep reinforcement learning. Nat.,
518(7540):529–533.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
van den Driessche, G., Schrittwieser, J., Antonoglou, I.,
Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe,
D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap,
T. P., Leach, M., Kavukcuoglu, K., Graepel, T., and Has-
sabis, D. (2016). Mastering the game of go with deep
neural networks and tree search. Nat., 529(7587):484–
489.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement learn-
ing: An introduction. MIT press.
Vamplew, P., Smith, B. J., K
¨
allstr
¨
om, J., de Oliveira Ramos,
G., Radulescu, R., Roijers, D. M., Hayes, C. F., Heintz,
F., Mannion, P., Libin, P. J. K., Dazeley, R., and Foale,
C. (2022). Scalar reward is not enough: a response to
silver, singh, precup and sutton (2021). Auton. Agents
Multi Agent Syst., 36(2):41.
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu,
M., Dudzik, A., Chung, J., Choi, D. H., Powell, R.,
Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss,
M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Aga-
piou, J. P., Jaderberg, M., Vezhnevets, A. S., Leblond, R.,
Pohlen, T., Dalibard, V., Budden, D., Sulsky, Y., Molloy,
J., Paine, T. L., G
¨
ulc¸ehre, C¸ ., Wang, Z., Pfaff, T., Wu,
Y., Ring, R., Yogatama, D., W
¨
unsch, D., McKinney, K.,
Smith, O., Schaul, T., Lillicrap, T. P., Kavukcuoglu, K.,
Hassabis, D., Apps, C., and Silver, D. (2019). Grandmas-
ter level in starcraft II using multi-agent reinforcement
learning. Nat., 575(7782):350–354.
Vouros, G. A. (2023). Explainable deep reinforcement
learning: State of the art and challenges. ACM Comput.
Surv., 55(5):92:1–92:39.
Wang, Y., Roohi, N., West, M., Viswanathan, M., and
Dullerud, G. E. (2020). Statistically model checking
PCTL specifications on markov decision processes via
reinforcement learning. In CDC, pages 1392–1397.
IEEE.
Zhao, C., Deng, C., Liu, Z., Zhang, J., Wu, Y., Wang, Y.,
and Yi, X. (2023). Interpretable reinforcement learning
of behavior trees. In ICMLC, pages 492–499. ACM.
Zhu, H., Xiong, Z., Magill, S., and Jagannathan, S. (2019).
An inductive synthesis framework for verifiable rein-
forcement learning. In PLDI, pages 686–701. ACM.
Probabilistic Model Checking of Stochastic Reinforcement Learning Policies
445