In this work, we proposed a domain generalization
method applicable to any semantic segmentation net-
work using monocular depth estimation, in particu-
lar reducing non-detected segments. We inferred a
depth heatmap via a modified segmentation network
that predicts foreground-background masks in paral-
lel to a semantic segmentation network. Aggregat-
ing both predictions in an uncertainty-aware manner
with a focus on important classes, false negative seg-
ments were successfully reduced. Our experiments
suggest that also in a single-sensor setup, the informa-
tion about spatial structure from pre-trained monocu-
lar depth estimators can be utilized well to improve
the robustness of off-the-shelf segmentation networks
under domain shift in various settings.
We thank M. K. Neugebauer for support in data han-
dling and programming. This work is supported by
the Ministry of Culture and Science of the German
state of North Rhine-Westphalia as part of the KI-
Starter research funding program.
Adelson, E. H. (2001). On seeing stuff: the perception of
materials by humans and machines. In IS&T/SPIE
Electronic Imaging. 1
Cao, Y., Shen, C., and Shen, H. T. (2017). Exploiting
depth from single monocular images for object detec-
tion and semantic segmentation. IEEE Transactions
on Image Processing. 4
Cardace, A., Luigi, L., Zama Ramirez, P., Salti, S., and
Di Stefano, L. (2022). Plugging self-supervised
monocular depth into unsupervised domain adaptation
for semantic segmentation. 3
Chan, R., Rottmann, M., H
uger, F., Schlicht, P., and
Gottschalk, H. (2020). Metafusion: Controlled false-
negative reduction of minority classes in semantic seg-
mentation. IEEE International Joint Conference on
Neural Networks (IJCNN). 3
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018). Encoder-decoder with atrous sep-
arable convolution for semantic image segmentation.
In European Conference on Computer Vision (ECCV).
1, 2, 5
Chen, P.-Y., Liu, A. H., Liu, Y.-C., and Wang, Y.-C. F.
(2019). Towards scene understanding: Unsupervised
monocular depth estimation with semantic-aware rep-
resentation. In IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR). 3
Chen, W., Yu, Z., Wang, Z., and Anandkumar, A. (2020).
Automated synthetic-to-real generalization. In Inter-
national Conference on Machine Learning (ICML). 3
Choi, S., Jung, S., Yun, H., Kim, J. T., Kim, S., et al.
(2021). Robustnet: Improving domain generalization
in urban-scene segmentation via instance selective
whitening. IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 11575–
11585. 3
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., et al. (2016). The cityscapes dataset for semantic
urban scene understanding. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). 1,
2, 5, 12
Friedman, J. H. (2002). Stochastic gradient boosting. Com-
put. Stat. Data Anal. 5
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. The Interna-
tional Journal of Robotics Research. 5
Geyer, J., Kassahun, Y., Mahmudi, M., Ricou, X., Durgesh,
R., et al. (2020). A2D2: Audi Autonomous Driving
Dataset. 2, 5, 12
Godard, C., Mac Aodha, O., Firman, M., and Brostow, G. J.
(2019). Digging into self-supervised monocular depth
prediction. 2, 5
Hazirbas, C., Ma, L., Domokos, C., and Cremers, D.
(2016). Fusenet: Incorporating depth into semantic
segmentation via fusion-based cnn architecture. In
Asian Conference on Computer Vision (ACCV). 3
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. 5
Hoyer, L., Dai, D., Chen, Y., Koring, A., Saha, S., et al.
(2021). Three ways to improve semantic segmentation
with self-supervised depth estimation. 4
Huang, G., Liu, Z., and Weinberger, K. Q. (2017). Densely
connected convolutional networks. IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Jaccard, P. (1912). The distribution of the flora in the alpine
zone. New Phytologist. 5
Jiang, H., Larsson, G., Maire, M., Shakhnarovich, G., and
Learned-Miller, E. G. (2018). Self-supervised relative
depth learning for urban scene understanding. In Eu-
ropean Conference on Computer Vision (ECCV). 4
Jiao, J., Cao, Y., Song, Y., and Lau, R. (2018). Look deeper
into depth: Monocular depth estimation with semantic
booster and attention-driven loss. In European Con-
ference on Computer Vision (ECCV). 3
Kim, N., Son, T., Lan, C., Zeng, W., and Kwak, S. (2021).
Wedge: Web-image assisted domain generalization
for semantic segmentation. 3
Kirillov, A., He, K., Girshick, R., Rother, C., and Dollar, P.
(2019). Panoptic segmentation. In IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR). 4
Lee, J. H., Han, M.-K., Ko, D. W., and Suh, I. H. (2019).
From big to small: Multi-scale local planar guidance
for monocular depth estimation. 2, 5
Lee, S., Seong, H., Lee, S., and Kim, E. (2022). Wildnet:
False Negative Reduction in Semantic Segmentation Under Domain Shift Using Depth Estimation