ences the reward that is given for a state. This results
in lower costs for generating triples for which more
preconditions have been introduced.
ACKNOWLEDGEMENTS
We are grateful to Fabian Heinrich and Markus
Rothg
¨
anger for help with developing the model, and
to Katharina Rohlfing, Heike Buhl, Josephine Fisher
and Erick Ronoh for valuable conceptual discus-
sions. This research was funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research
Foundation): TRR 318/1 2021 – 438445824.
REFERENCES
Allameh, M. and Zaman, L. (2021). Jessy: A Conversa-
tional Assistant for Tutoring Digital Board Games. In
Extended Abstracts of the 2021 Annual Symposium
on Computer-Human Interaction in Play, pages 168–
173, Virtual Event Austria. Association fpr Comput-
ing Machinery.
Anjomshoae, S., Najjar, A., Calvaresi, D., and Fr
¨
amling,
K. (2019). Explainable Agents and Robots: Re-
sults from a Systematic Literature Review. In Pro-
ceedings of the 18th International Conference on Au-
tonomous Agents and MultiAgent Systems, AAMAS
’19, pages 1078–1088. International Foundation for
Autonomous Agents and Multiagent Systems.
Bakker, B., Zivkovic, Z., and Krose, B. (2005). Hierarchi-
cal dynamic programming for robot path planning. In
2005 IEEE/RSJ International Conference on Intelli-
gent Robots and Systems, pages 2756–2761.
Barry, J., Kaelbling, L. P., and Lozano-Perez, T. (2010).
Hierarchical Solution of Large Markov Decision
Processes. ICAPS-10 Workshop on Planning and
Scheduling Under Uncertainty, 2010, page 8.
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M.,
Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D.,
Samothrakis, S., and Colton, S. (2012). A Survey of
Monte Carlo Tree Search Methods. IEEE Transac-
tions on Computational Intelligence and AI in Games,
4(1):1–43.
Buhl, H. M. (2001). Partner Orientation and Speaker’s
Knowledge as Conflicting Parameters in Language
Production. Journal of Psycholinguistic Research,
pages 549–567.
Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, learn-
ing, and games. Cambridge University Press, Cam-
bridge. OCLC: 70056026.
Chades, I., Carwardine, J., Martin, T., Nicol, S., Sabbadin,
R., and Buffet, O. (2012). MOMDPs: A Solution for
Modelling Adaptive Management Problems. Proceed-
ings of the AAAI Conference on Artificial Intelligence,
26(1):267–273.
Collins, M. (2011). Probabilistic Context-Free Grammars
(PCFGs). Columbia University.
Coulom, R. (2007). Efficient Selectivity and Backup Oper-
ators in Monte-Carlo Tree Search. In van den Herik,
H. J., Ciancarini, P., and Donkers, H. H. L. M. J.,
editors, Computers and Games, pages 72–83, Berlin,
Heidelberg. Springer Berlin Heidelberg.
El-Assady, M., Jentner, W., Kehlbeck, R., Schlegel, U.,
Sevastjanova, R., Sperrle, F., Spinner, T., and Keim,
D. (2019). Towards XAI: Structuring the Processes of
Explanations. Proceedings of the ACM Workshop on
Human-Centered Machine Learning, Glasgow, UK,
4:13.
Fruit, R., Pirotta, M., Lazaric, A., and Brundskill, E. (2017).
Regret Minimization in MDPs with Options without
Prior Knowledge. Advances in Neural Information
Processing Systems, 30.
Grice, H. P. (1975). Logic and Conversation. Speech Acts,
3:41 – 58.
Gunning, D., Vorm, E., Wang, J. Y., and Turek, M. (2021).
DARPA’s explainable AI (XAI) program: A retro-
spective. Applied AI Letters, 2(4):e61.
Hadoux, E. (2015). Markovian sequential decision-making
in non-stationary environments: application to argu-
mentative debates. PhD thesis, Universit
´
e Pierre et
Marie Curie - Paris.
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean,
T., and Boutilier, C. (2013). Hierarchical Solution
of Markov Decision Processes using Macro-actions.
arXiv preprint arXiv:1301.7381, page 10.
Keil, F. C. (2006). Explanation and Understanding. Annual
Review of Psychology, 57(1):227–254.
Kessler, C., Capocchi, L., Santucci, J.-F., and Zeigler, B.
(2017). Hierarchical Markov decision process based
on DEVS formalism. In 2017 Winter Simulation Con-
ference (WSC), pages 1001–1012, Las Vegas, NV.
IEEE.
Kocsis, L., Szepesvari, C., and Willemson, J. (2006). Im-
proved Monte-Carlo Search. Univ. Tartu, Tech. Rep,
1:22.
Lapan, M. (2018). Deep reinforcement learning hands-
on: apply modern RL methods, with deep Q-networks,
value iteration, policy gradients, TRPO, AlphaGo
Zero and more. Packt Publishing Ltd.
Lecarpentier, E. and Rachelson, E. (2019). Non-Stationary
Markov Decision Processes, a Worst-Case Approach
using Model-Based Reinforcement Learning. Ad-
vances in neural information processing systems,
32:10.
Miller, T. (2019). Explanation in Artificial Intelligence: In-
sights from the Social Sciences. Artificial Intelligence,
267:1 – 38. arXiv:1706.07269 [cs].
Motik, B., Patel-Schneider, P. F., Parsia, B., Bock, C., Fok-
oue, A., Haase, P., Hoekstra, R., Horrocks, I., Rutten-
berg, A., Sattler, U., et al. (2009). Owl 2 web ontology
language: Structural specification and functional-style
syntax. W3C recommendation, 27(65):159.
Pettet, G., Mukhopadhyay, A., and Dubey, A. (2022).
Decision Making in Non-Stationary Environments
SNAPE: A Sequential Non-Stationary Decision Process Model for Adaptive Explanation Generation
57