(a) MCE.
(b) WAKLD.
(c) JSD. (d) SEMD.
Figure 7: Comparison of the multimodal uncertainty met-
rics. Blue curves show computed values on the unimodal
data, while red curves show values of the multimodal data.
All values are averages of 10 evaluation runs.
7 DISCUSSION
In this work, we presented a first approach for tack-
ling the challenge of detecting multimodality in world
models. As model based reinforcement learning in-
creasingly gains practical relevance, not least through
the development of methods like world models, ap-
proaches and metrics like the ones evaluated in this
work, become of high relevance for the development
of reliable and safe RL systems. Our evaluation re-
sults showed that it is possible to detect multimodal
state-transitions and differentiate them from unimodal
ones, by applying multimodality metrics on the MDN
network of a world model architecture. The metrics
we newly introduced in this work, MCE and WAKLD
both performed well, allowing for a reliable differen-
tiation when using a mixture component count k > 6.
Using the symmetric divergence measure JSD turned
out to produce the most consistent differentiation be-
tween unimodal and multimodal data for any num-
ber of components used. On the other hand, the ap-
plication of SEMD, which is based on the Wasser-
stein metric, needs extra care, as in cases where a low
amount of mixture components is used, the reported
values fluctuated strongly. As a consequence, no reli-
able multimodality detection would be possible here.
As a next step, we plan to use the developed ap-
proach and metrics to construct a complete multi-
modal state-transition one-class classificator. It would
also be interesting to further develop variants of the
Wasserstein based SEMD as well as the WAKLD
metric, with the goal of improving the metrics for
MDNs with low component count.
REFERENCES
Amodei, D., Olah, C., Steinhardt, J., et al. (2016).
Concrete problems in ai safety. arXiv preprint
arXiv:1606.06565.
Bishop, C. (1994). Mixture density networks. Workingpa-
per, Aston University.
Brando, A. (2017). Mixture density networks (mdn) for
distribution and uncertainty estimation. Report of the
Master’s Thesis: Mixture Density Networks for distri-
bution and uncertainty estimation.
Cobbe, K., Hesse, C., Hilton, J., and Schulman, J. (2019).
Leveraging procedural generation to benchmark rein-
forcement learning. arXiv preprint arXiv:1912.01588.
Friedman, J. and Fisher, N. (1999). Bump hunting in high-
dimensional data. Statistics and Computing, 9.
Ha, D. and Schmidhuber, J. (2018). World models. CoRR,
abs/1803.10122.
Huber, P. J. (1992). Robust estimation of a location param-
eter. In Breakthroughs in statistics. Springer.
Kendall, A. and Gal, Y. (2017). What uncertainties do we
need in bayesian deep learning for computer vision?
arXiv preprint arXiv:1703.04977.
Lin, J. (1991). Divergence measures based on the shannon
entropy. IEEE Transactions on Information Theory,
37(1).
Makansi, O., Ilg, E., C¸ ic¸ek,
¨
O., and Brox, T. (2019). Over-
coming limitations of mixture density networks: A
sampling and fitting framework for multimodal future
prediction. CoRR, abs/1906.03631.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, et al.
(2015). Human-level control through deep reinforce-
ment learning. nature, 518(7540).
Osband, I., Aslanides, J., and Cassirer, A. (2018). Ran-
domized prior functions for deep reinforcement learn-
ing. In Advances in Neural Information Processing
Systems, volume 31. Curran Associates, Inc.
Polonik, W. and Wang, Z. (2010). Prim analysis. J. Multi-
var. Anal., 101.
Schrittwieser, J., Antonoglou, I., Hubert, T., et al. (2020).
Mastering atari, go, chess and shogi by planning with
a learned model. Nature, 588(7839).
Sedlmeier, A., Gabor, T., Phan, T., et al. (2020a).
Uncertainty-based out-of-distribution classification in
deep reinforcement learning. In Proceedings of the
12th International Conference on Agents and Artifi-
cial Intelligence - Volume 2: ICAART,. SciTePress.
Sedlmeier, A., M
¨
uller, R., Illium, S., and Linnhoff-Popien,
C. (2020b). Policy entropy for out-of-distribution
classification. In Artificial Neural Networks and Ma-
chine Learning – ICANN 2020, Cham.
Silver, D., Hubert, T., Schrittwieser, J., et al. (2018). A
general reinforcement learning algorithm that mas-
ters chess, shogi, and go through self-play. Science,
362(6419).
Vinyals, O., Babuschkin, I., Czarnecki, W. M., et al. (2019).
Grandmaster level in starcraft ii using multi-agent re-
inforcement learning. Nature, 575(7782).
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
374