with the concrete dropout-based uncertainty estima-
tion method (UB-MC) does not produce viable re-
sults. Although increasing the amount of Monte-
Carlo samples improves the performance somewhat,
the resulting classification performance is not on par
with the Bootstrap-based versions. The reason for the
large difference in performance can be seen in the ex-
ample shown in Figure 6. For the UB-B version,
the reported uncertainties on environment configura-
tion 0 (training) and 7 (strong modification) increas-
ingly diverge with progressing training episodes (Fig-
ure 6a). As this is not the case for the UB-MC version
(Figure 6b), only the Bootstrap-based version allows
for an increasingly better differentiation between in-
and OOD samples and consequently high F1-scores
of the classifier. We found this effect to be consis-
tent over all parametrizations of the Bootstrap- and
MCCD-based versions we evaluated.
Our results match recent findings (Beluch et al.,
2018), where ensemble-based uncertainty estimators
were compared against Monte-Carlo Dropout based
ones for the case of active learning in image classifica-
tion. There also, ensembles performed better and led
to more calibrated uncertainty estimates. The authors
argue that the difference in performance could be a
result of a combination of decreased model capac-
ity and lower diversity of the Monte-Carlo Dropout
methods when compared to ensemble approaches.
This effect would also explain the behaviour we ob-
served when comparing uncertainty and achieved re-
turn. While there is a strong inverse relation when
using Bootstrap-based UBOOD versions, no clear
pattern emerged for the evaluated MCCD-based ver-
sions. We think that further research into the rela-
tion between epistemic uncertainty and achieved re-
turn when train- and test-environments differ could
provide interesting insights relating to generalization
performance in deep RL. Being able to differenti-
ate between an agent having encountered a situation
in training versus the agent generalizing its experi-
ence to new situations could provide a huge benefit
in safety-critical situations.
REFERENCES
Bellemare, M. G., Dabney, W., and Munos, R. (2017). A
distributional perspective on reinforcement learning.
In Proceedings of the 34th International Conference
on Machine Learning-Volume 70, pages 449–458.
Beluch, W. H., Genewein, T., N
¨
urnberger, A., and K
¨
ohler,
J. M. (2018). The power of ensembles for active learn-
ing in image classification. In The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Bishop, C. M. (1994). Novelty detection and neural net-
work validation. IEE Proceedings - Vision, Image and
Signal Processing, 141(4):217–222.
Gal, Y., Hron, J., and Kendall, A. (2017). Concrete dropout.
In Advances in Neural Information Processing Sys-
tems 30, pages 3581–3590.
Graves, A. (2011). Practical variational inference for neural
networks. In Advances in Neural Information Pro-
cessing Systems 24, pages 2348–2356.
Hendrycks, D. and Gimpel, K. (2016). A Baseline for De-
tecting Misclassified and Out-of-Distribution Exam-
ples in Neural Networks. ArXiv e-prints.
Hern
´
andez-Lobato, J., Li, Y., Rowland, M., Hern
´
andez-
Lobato, D., Bui, T., and Ttarner, R. (2016). Black-
box α-divergence minimization. In 33rd International
Conference on Machine Learning, ICML 2016, vol-
ume 4, pages 2256–2273.
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., and
Levine, S. (2017). Uncertainty-aware reinforcement
learning for collision avoidance. arXiv preprint
arXiv:1702.01182.
Kendall, A. and Gal, Y. (2017). What uncertainties do we
need in bayesian deep learning for computer vision?
In Advances in Neural Information Processing Sys-
tems 30, pages 5574–5584.
Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017).
Simple and scalable predictive uncertainty estimation
using deep ensembles. In Advances in Neural Infor-
mation Processing Systems 30, pages 6402–6413.
Li, Y. and Gal, Y. (2017). Dropout Inference in Bayesian
Neural Networks with Alpha-divergences. ArXiv e-
prints.
Liang, S., Li, Y., and Srikant, R. (2017). Enhancing The
Reliability of Out-of-distribution Image Detection in
Neural Networks. ArXiv e-prints.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Ve-
ness, J., Bellemare, M. G., Graves, A., Riedmiller, M.,
Fidjeland, et al. (2015). Human-level control through
deep reinforcement learning. Nature, 518(7540):529.
Osband, I., Aslanides, J., and Cassirer, A. (2018). Random-
ized Prior Functions for Deep Reinforcement Learn-
ing. ArXiv e-prints.
Osband, I., Blundell, C., Pritzel, A., and Van Roy, B.
(2016). Deep exploration via bootstrapped dqn. In
Advances in Neural Information Processing Systems
29, pages 4026–4034.
Pimentel, M. A., Clifton, D. A., Clifton, L., and Tarassenko,
L. (2014). A review of novelty detection. Signal Pro-
cessing, 99:215 – 249.
Qazaz, C. S. (1996). Bayesian error bars for regression.
PhD thesis, Aston University.
Schlegl, T., Seeb
¨
ock, P., Waldstein, S. M., Schmidt-Erfurth,
U., and Langs, G. (2017). Unsupervised anomaly de-
tection with generative adversarial networks to guide
marker discovery. In IPMI.
Sedlmeier, A., Gabor, T., Phan, T., Belzner, L., and
Linnhoff-Popien, C. (2019). Uncertainty-based out-
of-distribution detection in deep reinforcement learn-
ing. arXiv preprint arXiv:1901.02219.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
PhD thesis, King’s College, Cambridge.
Uncertainty-based Out-of-Distribution Classification in Deep Reinforcement Learning
529