Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018). Encoder-Decoder with Atrous Sep-
arable Convolution for Semantic Image Segmentation.
In Proceedings of the European Conference on Com-
puter Vision (ECCV).
Chollet, F. (2017). Xception: Deep Learning with Depth-
wise Separable Convolutions. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The Cityscapes Dataset for Semantic Ur-
ban Scene Understanding. In Proc. of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Everingham, M., Eslami, S. A., Van Gool, L., Williams,
C. K., Winn, J., and Zisserman, A. (2015). The Pas-
cal Visual Object Classes Challenge: A Retrospective.
International Journal of Computer Vision.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for Autonomous Driving? The KITTI Vision Bench-
mark Suite. In Conference on Computer Vision and
Pattern Recognition (CVPR).
Girshick, R. (2015). Fast R-CNN. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition.
Hadsell, R., Chopra, S., and LeCun, Y. (2006). Dimension-
ality Reduction by Learning an Invariant Mapping. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Identity
Mappings in Deep Residual Networks. In European
Conference on Computer Vision.
Kendall, A., Gal, Y., and Cipolla, R. (2017). Multi-
Task Learning Using Uncertainty to Weigh Losses for
Scene Geometry and Semantics. CoRR.
Kingma, D. P. and Ba, J. (2014). Adam: A Method for
Stochastic Optimization.
Lin, T.-Y., Goyal, P., Girshick, R. B., He, K., and Doll
´
ar,
P. (2017). Focal Loss for Dense Object Detection. In
ICCV. IEEE Computer Society.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). SSD: Single Shot
MultiBox Detector. In European Conference on Com-
puterVision.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully Con-
volutional Networks for Semantic Segmentation. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition.
Meyer, A., Salscheider, N. O., Orzechowski, P., and Stiller,
C. (2018). Deep Semantic Lane Segmentation for
Mapless Driving.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You Only Look Once: Unified, Real-Time
Object Detection. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition.
Redmon, J. and Farhadi, A. (2017). YOLO9000: Better,
Faster, Stronger. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition.
Redmon, J. and Farhadi, A. (2018). YOLOv3: An Incre-
mental Improvement.
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster
R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks. CoRR.
Sistu, G., Leang, I., and Yogamani, S. (2019). Real-time
Joint Object Detection and Semantic Segmentation
Network for Automated Driving.
Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., and
Urtasun, R. (2018). MultiNet: Real-time Joint Seman-
tic Reasoning for Autonomous Driving. In 2018 IEEE
Intelligent Vehicles Symposium (IV).
Uhrig, J., Cordts, M., Franke, U., and Brox, T.
(2016). Pixel-Level Encoding and Depth Layering for
Instance-Level Semantic Labeling. In GCPR.
van de Sande, K. E. A., Uijlings, J. R. R., Gevers, T., and
Smeulders, A. W. M. (2011). Segmentation as Selec-
tive Search for Object Recognition. In ICCV.
Wu, Z., Shen, C., and van den Hengel, A. (2016). Wider
or Deeper: Revisiting the ResNet Model for Visual
Recognition.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyra-
mid Scene Parsing Network. In CVPR. IEEE Com-
puter Society.
Simultaneous Object Detection and Semantic Segmentation
561