Brostow, G. J., Fauqueur, J., and Cipolla, R. (2008a).
Semantic object classes in video: A high-definition
ground truth database. Pattern Recognition Letters,
xx(x):xx–xx.
Brostow, G. J., Shotton, J., Fauqueur, J., and Cipolla, R.
(2008b). Segmentation and recognition using struc-
ture from motion point clouds. In ECCV (1), pages
44–57.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2014). Semantic image segmentation
with deep convolutional nets and fully connected crfs.
arXiv preprint arXiv:1412.7062.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and
Yuille, A. L. (2018a). Deeplab: Semantic image seg-
mentation with deep convolutional nets, atrous convo-
lution, and fully connected crfs. IEEE transactions on
pattern analysis and machine intelligence, 40(4):834–
848.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018b). Encoder-decoder with atrous se-
parable convolution for semantic image segmentation.
arXiv preprint arXiv:1802.02611.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In Proc. of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Eitel, A., Springenberg, J. T., Spinello, L., Riedmiller, M.,
and Burgard, W. (2015). Multimodal deep learning
for robust rgb-d object recognition. In Intelligent Ro-
bots and Systems (IROS), 2015 IEEE/RSJ Internatio-
nal Conference on, pages 681–687. IEEE.
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-
Martinez, V., and Garcia-Rodriguez, J. (2017). A re-
view on deep learning techniques applied to semantic
segmentation. arXiv preprint arXiv:1704.06857.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. International
Journal of Robotics Research (IJRR).
Harchanko, J. S. and Chenault, D. B. (2005). Water-surface
object detection and classification using imaging pola-
rimetry. In Polarization Science and Remote Sensing
II, volume 5888, page 588815. International Society
for Optics and Photonics.
Hazirbas, C., Ma, L., Domokos, C., and Cremers, D.
(2016). Fusenet: Incorporating depth into semantic
segmentation via fusion-based cnn architecture. In
Asian Conference on Computer Vision, pages 213–
228. Springer.
Hwang, S., Park, J., Kim, N., Choi, Y., and So Kweon, I.
(2015). Multispectral pedestrian detection: Bench-
mark dataset and baseline. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 1037–1045.
Jaritz, M., De Charette, R., Wirbel, E., Perrotton, X., and
Nashashibi, F. (2018). Sparse and dense data with
cnns: Depth completion and semantic segmentation.
arXiv preprint arXiv:1808.00769.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).
Imagenet classification with deep convolutional neu-
ral networks. In Advances in neural information pro-
cessing systems, pages 1097–1105.
Li, Z., Gan, Y., Liang, X., Yu, Y., Cheng, H., and Lin,
L. (2016). Lstm-cf: Unifying context modeling and
fusion with lstms for rgb-d scene labeling. In Euro-
pean Conference on Computer Vision, pages 541–557.
Springer.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 3431–3440.
Ma, L., St
¨
uckler, J., Kerl, C., and Cremers, D. (2017).
Multi-view deep learning for consistent semantic
mapping with rgb-d cameras. In Intelligent Robots
and Systems (IROS), 2017 IEEE/RSJ International
Conference on, pages 598–605. IEEE.
Mohri, M., Rostamizadeh, A., and Talwalkar, A. (2012).
Foundations of Machine Learning. The MIT Press.
Moisan, L., Moulon, P., and Monasse, P. (2012). Automatic
homographic registration of a pair of images, with a
contrario elimination of outliers. Image Processing
On Line, 2:56–73.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Valada, A., Dhall, A., and Burgard, W. (2016a). Convoluted
mixture of deep experts for robust semantic segmen-
tation. In IEEE/RSJ International Conference on In-
telligent Robots and Systems (IROS) Workshop, State
Estimation and Terrain Perception for All Terrain Mo-
bile Robots.
Valada, A., Oliveira, G., Brox, T., and Burgard, W. (2016b).
Deep multispectral semantic scene understanding of
forested environments using multimodal fusion. In
The 2016 International Symposium on Experimental
Robotics (ISER 2016), Tokyo, Japan.
Vapnik, V. (1998). Statistical learning theory. 1998, vo-
lume 3. Wiley, New York.
Walraven, R. (1977). Polarization imagery. In Optical Po-
larimetry: Instrumentation and Applications, volume
112, pages 164–168. International Society for Optics
and Photonics.
Wolff, L. B. (1997). Polarization vision: a new sensory
approach to image understanding. Image and Vision
computing, 15(2):81–93.
Yu, F. and Koltun, V. (2015). Multi-scale context ag-
gregation by dilated convolutions. arXiv preprint
arXiv:1511.07122.
Exploration of Deep Learning-based Multimodal Fusion for Semantic Road Scene Segmentation
343