fied in comparison to expert knowledge. If, for exam-
ple, the agent would increase the action (acceleration)
when the velocity is very large, this would be a strong
indicator of misbehavior.
6 CONCLUSION
The goal of this work was to develop a methodol-
ogy to explain how a trained Reinforcement Learning
agent selects its action in a particular situation. For
this purpose, SHAP values were calculated for the dif-
ferent input features and the effect of each feature on
the selected action was shown in a novel RL-SHAP
diagram representation. The proposed method for ex-
plainable RL was tested using the LongiControl envi-
ronment solved using the DDPG DRL algorithm.
The results show that the RL-SHAP representa-
tion clarifies which state features have a positive, neg-
ative or negligible influence on the action. Our anal-
ysis of the behavior of the agent on a test trajectory
showed that the contributions of the different state
features can be logically explained given some do-
main knowledge. We can therefore conclude that the
use of SHAP and its integration within RL is helpful
to explain the decision-making process of the agent.
As future work, it would be interesting to study
if prior human expert knowledge can be inserted in
the agent using the same RL-SHAP representation.
Finally, we want to study methods that can explain
the decision-making process of DRL agents in high-
dimensional input spaces.
REFERENCES
Brown, A. and Petrik, M. (2018). Interpretable rein-
forcement learning with ensemble methods. CoRR,
abs/1809.06995.
Dohmen, J. et al. (2020). Longicontrol: A reinforcement
learning environment for longitudinal vehicle control.
Researchgate.
Doshi-Velez, F. and Kim, B. (2017). Towards a rigor-
ous science of interpretable machine learning. CoRR.
https://arxiv.org/abs/1702.08608.
El Sallab, A. et al. (2017). Deep reinforcement learning
framework for autonomous driving. CoRR. http:
//arxiv.org/abs/1704.02532.
Gr
¨
undl, M. (2005). Fehler und Fehlverhalten als Ur-
sache von Verkehrsunf
¨
allen und Konsequenzen f
¨
ur
das Unfallvermeidungspotenzial und die Gestaltung
von Fahrerassistenzsystemen. PhD thesis, Universit
¨
at
Regensburg.
Gu, S. et al. (2016). Q-prop: Sample-efficient policy gradi-
ent with an off-policy critic. CoRR. http://arxiv.org/
abs/1611.02247.
Hein, D. et al. (2018). Interpretable policies for rein-
forcement learning by genetic programming. CoRR,
abs/1712.04170.
Kendall, A. et al. (2018). Learning to drive in a day. https:
//arxiv.org/abs/1807.00412.
Kindermans, P. et al. (2017). The (un)reliability of saliency
methods. https://arxiv.org/abs/1711.00867.
Li, Y. et al. (2020). Transforming cooling optimization
for green data center via deep reinforcement learning.
IEEE Transactions on Cybernetics, 50(5):2002–2013.
Liessner, R. et al. (2018). Deep reinforcement learning for
advanced energy management of hybrid electric ve-
hicles. 10th International Conference on Agents and
Artificial Intelligence.
Lillicrap, T. et al. (2015). Continuous control with deep
reinforcement learning. CoRR. http://arxiv.org/abs/
1509.02971.
Lipton, Z. C. (2016). The mythos of model interpretability.
CoRR. http://arxiv.org/abs/1606.03490.
Lundberg, S. et al. (2017). A unified approach to inter-
preting model predictions. In Advances in Neural In-
formation Processing Systems 30, pages 4765–4774.
Curran Associates, Inc.
Mnih, V. et al. (2013). Playing atari with deep reinforce-
ment learning. CoRR. http://arxiv.org/abs/1312.5602.
Montavon, G. et al. (2017). Methods for interpreting and
understanding deep neural networks. CoRR. http://
arxiv.org/abs/1706.07979.
Radke, T. (2013). Energieoptimale L
¨
angsf
¨
uhrung von
Kraftfahrzeugen durch Einsatz vorausschauender
Fahrstrategien. PhD thesis, Karlsruhe Institute of
Technology (KIT).
Rizzo, S. et al. (2019). Reinforcement learning with ex-
plainability for traffic signal control. In 2019 IEEE In-
telligent Transportation Systems Conference (ITSC),
pages 3567–3572.
Shapley, L. (1953). A value for n-persons games. Contri-
butions to the Theory of Games 2, 28:307–317.
Shrikumar, A. et al. (2017). Learning important features
through propagating activation differences. CoRR,
abs/1704.02685.
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction. MIT Press, Cambridge, MA,
USA, 2te edition.
Verma, A. et al. (2018). Programmatically interpretable re-
inforcement learning. CoRR, abs/1804.02477.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
PhD thesis, University of Cambridge.
Explainable Reinforcement Learning for Longitudinal Control
881