fied in comparison to expert knowledge. If, for exam-
ple, the agent would increase the action (acceleration)
when the velocity is very large, this would be a strong
indicator of misbehavior.
The goal of this work was to develop a methodol-
ogy to explain how a trained Reinforcement Learning
agent selects its action in a particular situation. For
this purpose, SHAP values were calculated for the dif-
ferent input features and the effect of each feature on
the selected action was shown in a novel RL-SHAP
diagram representation. The proposed method for ex-
plainable RL was tested using the LongiControl envi-
ronment solved using the DDPG DRL algorithm.
The results show that the RL-SHAP representa-
tion clarifies which state features have a positive, neg-
ative or negligible influence on the action. Our anal-
ysis of the behavior of the agent on a test trajectory
showed that the contributions of the different state
features can be logically explained given some do-
main knowledge. We can therefore conclude that the
use of SHAP and its integration within RL is helpful
to explain the decision-making process of the agent.
As future work, it would be interesting to study
if prior human expert knowledge can be inserted in
the agent using the same RL-SHAP representation.
Finally, we want to study methods that can explain
the decision-making process of DRL agents in high-
dimensional input spaces.
Brown, A. and Petrik, M. (2018). Interpretable rein-
forcement learning with ensemble methods. CoRR,
Dohmen, J. et al. (2020). Longicontrol: A reinforcement
learning environment for longitudinal vehicle control.
Doshi-Velez, F. and Kim, B. (2017). Towards a rigor-
ous science of interpretable machine learning. CoRR.
El Sallab, A. et al. (2017). Deep reinforcement learning
framework for autonomous driving. CoRR. http:
undl, M. (2005). Fehler und Fehlverhalten als Ur-
sache von Verkehrsunf
allen und Konsequenzen f
das Unfallvermeidungspotenzial und die Gestaltung
von Fahrerassistenzsystemen. PhD thesis, Universit
Gu, S. et al. (2016). Q-prop: Sample-efficient policy gradi-
ent with an off-policy critic. CoRR.
Hein, D. et al. (2018). Interpretable policies for rein-
forcement learning by genetic programming. CoRR,
Kendall, A. et al. (2018). Learning to drive in a day. https:
Kindermans, P. et al. (2017). The (un)reliability of saliency
Li, Y. et al. (2020). Transforming cooling optimization
for green data center via deep reinforcement learning.
IEEE Transactions on Cybernetics, 50(5):2002–2013.
Liessner, R. et al. (2018). Deep reinforcement learning for
advanced energy management of hybrid electric ve-
hicles. 10th International Conference on Agents and
Artificial Intelligence.
Lillicrap, T. et al. (2015). Continuous control with deep
reinforcement learning. CoRR.
Lipton, Z. C. (2016). The mythos of model interpretability.
Lundberg, S. et al. (2017). A unified approach to inter-
preting model predictions. In Advances in Neural In-
formation Processing Systems 30, pages 4765–4774.
Curran Associates, Inc.
Mnih, V. et al. (2013). Playing atari with deep reinforce-
ment learning. CoRR.
Montavon, G. et al. (2017). Methods for interpreting and
understanding deep neural networks. CoRR. http://
Radke, T. (2013). Energieoptimale L
uhrung von
Kraftfahrzeugen durch Einsatz vorausschauender
Fahrstrategien. PhD thesis, Karlsruhe Institute of
Technology (KIT).
Rizzo, S. et al. (2019). Reinforcement learning with ex-
plainability for traffic signal control. In 2019 IEEE In-
telligent Transportation Systems Conference (ITSC),
pages 3567–3572.
Shapley, L. (1953). A value for n-persons games. Contri-
butions to the Theory of Games 2, 28:307–317.
Shrikumar, A. et al. (2017). Learning important features
through propagating activation differences. CoRR,
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge, MA,
USA, 2te edition.
Verma, A. et al. (2018). Programmatically interpretable re-
inforcement learning. CoRR, abs/1804.02477.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
PhD thesis, University of Cambridge.
Explainable Reinforcement Learning for Longitudinal Control