Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2018a). Deeplab: Semantic image seg-
mentation with deep convolutional nets, atrous con-
volution, and fully connected crfs. IEEE Transacti-
ons on Pattern Analysis and Machine Intelligence,
40(4):834–848.
Chen, L., Yang, Z., Ma, J., and Luo, Z. (2018b). Driving
scene perception network: Real-time joint detection,
depth estimation and semantic segmentation. 2018
IEEE Winter Conference on Applications of Compu-
ter Vision (WACV).
Chen, Z., Badrinarayanan, V., Lee, C.-Y., and Rabinovich,
A. (2018c). Gradnorm: Gradient normalization for
adaptive loss balancing in deep multitask networks. In
ICML.
C¸ ic¸ek,
¨
O., Abdulkadir, A., Lienkamp, S. S., Brox, T., and
Ronneberger, O. (2016). 3d u-net: learning dense vo-
lumetric segmentation from sparse annotation. In In-
ternational Conference on Medical Image Computing
and Computer-Assisted Intervention, pages 424–432.
Springer.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In Proc. of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A Large-Scale Hierarchical
Image Database. In CVPR09.
D
´
esid
´
eri, J.-A. (2009). Multiple-gradient descent algorithm
( mgda ).
Eigen, D. and Fergus, R. (2015). Predicting depth, surface
normals and semantic labels with a common multi-
scale convolutional architecture. 2015 IEEE Interna-
tional Conference on Computer Vision (ICCV).
Freeman, I., Roese-Koerner, L., and Kummert, A. (2018).
Effnet: An efficient structure for convolutional neural
networks. In 2018 25th IEEE International Confe-
rence on Image Processing (ICIP), pages 6–10.
Gaidon, A., Wang, Q., Cabon, Y., and Vig, E. (2016). Vir-
tual worlds as proxy for multi-object tracking analy-
sis. In CVPR.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. International
Journal of Robotics Research (IJRR).
Guo, M., Haque, A., Huang, D.-A., Yeung, S., and Fei-Fei,
L. (2018). Dynamic task prioritization for multitask
learning. In European Conference on Computer Vi-
sion, pages 282–299. Springer.
Gurram, A., Urfalioglu, O., Halfaoui, I., Bouzaraa, F., and
Lopez, A. M. (2018). Monocular depth estimation by
learning from heterogeneous datasets. 2018 IEEE In-
telligent Vehicles Symposium (IV).
Hazirbas, C., Ma, L., Domokos, C., and Cremers, D.
(2016). Fusenet: Incorporating depth into semantic
segmentation via fusion-based cnn architecture. In
Asian Conference on Computer Vision, pages 213–
228. Springer.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep re-
sidual learning for image recognition. In 2016 IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 770–778.
Jafari, O. H., Groth, O., Kirillov, A., Yang, M. Y., and Rot-
her, C. (2017). Analyzing modular cnn architectures
for joint depth prediction and semantic segmentation.
2017 IEEE International Conference on Robotics and
Automation (ICRA).
Kendall, A., Gal, Y., and Cipolla, R. (2018). Multi-task
learning using uncertainty to weigh losses for scene
geometry and semantics. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Kingma, D. P. and Ba, J. (2014). Adam: A method for
stochastic optimization.
Kokkinos, I. (2017). Ubernet: Training a universal convo-
lutional neural network for low-, mid-, and high-level
vision using diverse datasets and limited memory. In
2017 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 5454–5463.
Kumar, V. R., Milz, S., Witt, C., Simon, M., Amende, K.,
Petzold, J., Yogamani, S., and Pech, T. (2018). Mo-
nocular fisheye camera depth estimation using sparse
lidar supervision. In 2018 21st International Confe-
rence on Intelligent Transportation Systems (ITSC),
pages 2853–2858. IEEE.
Liebel, L. and K
¨
orner, M. (2018). Auxiliary tasks in multi-
task learning. arXiv preprint arXiv:1805.06334.
Liu, S. (2018). EXPLORATION ON DEEP DRUG DISCO-
VERY: REPRESENTATION AND LEARNING. PhD
thesis, UNIVERSITY OF WISCONSIN-MADISON.
Liu, S., Johns, E., and Davison, A. J. (2018). End-to-end
multi-task learning with attention.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 3431–3440.
Mousavian, A., Pirsiavash, H., and Kosecka, J. (2016).
Joint semantic segmentation and depth estimation
with deep convolutional networks. 2016 Fourth In-
ternational Conference on 3D Vision (3DV).
Neuhold, G., Ollmann, T., Bulo, S. R., and Kontschieder,
P. (2017). The mapillary vistas dataset for semantic
understanding of street scenes. In ICCV, pages 5000–
5009.
Neven, D., Brabandere, B. D., Georgoulis, S., Proesmans,
M., and Gool, L. V. (2017). Fast scene understanding
for autonomous driving.
Parthasarathy, S. and Busso, C. (2018). Ladder networks
for emotion recognition: Using unsupervised auxili-
ary tasks to improve predictions of emotional attribu-
tes. In Interspeech.
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., and Lo-
pez, A. M. (2016). The synthia dataset: A large col-
lection of synthetic images for semantic segmentation
of urban scenes. In Proceedings of the IEEE Confe-
rence on Computer Vision and Pattern Recognition,
pages 3234–3243.
Ruder, S. (2017). An overview of multi-task learning in
deep neural networks.
AuxNet: Auxiliary Tasks Enhanced Semantic Segmentation for Automated Driving
651