Figure 8: This figure shows the same situation as in Fig-
ure 5, but after applying our SHAP value clustering scheme.
real-time analysis.
Further work could, for instance, include under-
standing the reasons behind the high variability of the
SHAP values for minor changes in input data in this
environment. Another interesting possible future re-
search direction could be to see whether techniques
for reducing overfitting and improving generalization,
such as the domain randomization introduced in (To-
bin et al., 2017), could make the neural network pol-
icy prioritize obstacle avoidance instead of just rec-
ognizing its environment and in return changing the
SHAP values to a more human-interpretable distribu-
tion.
Overall, our findings suggest that while SHAP can
enhance the transparency of DRL agents, there are
still challenges in ensuring the interpretability of these
explanations.
ACKNOWLEDGEMENTS
The Research Council of Norway supported this
work through the EXAIGON project, project number
304843.
REFERENCES
Badia, A. P., Piot, B., Kapturowski, S., Sprechmann, P.,
Vitvitskyi, A., Guo, Z. D., and Blundell, C. (2020).
Agent57: Outperforming the atari human bench-
mark. In International conference on machine learn-
ing, pages 507–517. PMLR.
Bellotti, F., Lazzaroni, L., Capello, A., Cossu, M., De Glo-
ria, A., and Berta, R. (2023). Explaining a deep
reinforcement learning (DRL)-based automated driv-
ing agent in highway simulations. IEEE Access,
11:28522–28550.
Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S.,
Panerati, J., and Schoellig, A. P. (2022). Safe learn-
ing in robotics: From learning-based control to safe
reinforcement learning. Annual Review of Control,
Robotics, and Autonomous Systems, 5:411–444.
Cimurs, R., Suh, I. H., and Lee, J. H. (2022). Goal-driven
autonomous exploration through deep reinforcement
learning. IEEE Robotics and Automation Letters,
7(2):730–737.
Cobbe, K., Klimov, O., Hesse, C., Kim, T., and Schul-
man, J. (2019). Quantifying generalization in rein-
forcement learning. In International conference on
machine learning, pages 1282–1289. PMLR.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al. (1996).
A density-based algorithm for discovering clusters in
large spatial databases with noise. In kdd, volume 96,
pages 226–231.
Gjærum, V. B., Str
¨
umke, I., Løver, J., Miller, T., and
Lekkas, A. M. (2023). Model tree methods for ex-
plaining deep reinforcement learning agents in real-
time robotic applications. Neurocomputing, 515:133–
144.
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S.,
Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P.,
et al. (2018). Soft actor-critic algorithms and applica-
tions. arXiv preprint arXiv:1812.05905.
Koenig, N. and Howard, A. (2004). Design and use
paradigms for gazebo, an open-source multi-robot
simulator. In Proceedings of the IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems
(IROS), volume 3, pages 2149–2154. IEEE.
Lundberg, S. M. and Lee, S.-I. (2017). A unified approach
to interpreting model predictions. Advances in neural
information processing systems, 30.
Molnar, C. (2022). Interpretable Machine Learning. 2 edi-
tion.
Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T.,
Leibs, J., Wheeler, R., Ng, A. Y., et al. (2009). Ros: an
open-source robot operating system. In ICRA work-
shop on open source software, volume 3, page 5.
Kobe, Japan.
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernes-
tus, M., and Dormann, N. (2021). Stable-baselines3:
Reliable reinforcement learning implementations. J.
Mach. Learn. Res., 22(268):1–8.
Remman, S. B., Str
¨
umke, I., and Lekkas, A. M. (2022).
Causal versus marginal shapley values for robotic
lever manipulation controlled using deep reinforce-
ment learning. In 2022 American Control Conference
(ACC), pages 2683–2690. IEEE.
Shapley, L. S. (1952). A VALUE FOR N-PERSON GAMES.
Defense Technical Information Center.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and
Abbeel, P. (2017). Domain randomization for transfer-
ring deep neural networks from simulation to the real
world. In 2017 IEEE/RSJ international conference on
intelligent robots and systems (IROS), pages 23–30.
IEEE.
ICINCO 2024 - 21st International Conference on Informatics in Control, Automation and Robotics
340