Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018b). Encoder-decoder with atrous sepa-
rable convolution for semantic image segmentation. In
Ferrari, V., Hebert, M., Sminchisescu, C., and Weiss,
Y., editors, ECCV (7), volume 11211 of Lecture Notes
in Computer Science, pages 833–851. Springer.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In Proc. of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). ImageNet: A Large-Scale Hierarchical
Image Database. In CVPR09.
Duong, L., Cohn, T., Bird, S., and Cook, P. (2015). Low
resource dependency parsing: Cross-lingual parame-
ter sharing in a neural network parser. pages 845–
850, Beijing, China. Association for Computational
Linguistics.
Farabet, C., Couprie, C., Najman, L., and LeCun, Y. (2012).
Scene parsing with multiscale feature learning, purity
trees, and optimal covers. In ICML. icml.cc / Omni-
press.
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu,
H. (2019). Dual attention network for scene segmen-
tation. In CVPR, pages 3146–3154. Computer Vision
Foundation / IEEE.
Girshick, R. (2015). Fast r-cnn. In The IEEE International
Conference on Computer Vision (ICCV).
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detection
and semantic segmentation. In 2014 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
volume 00, pages 580–587.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving
deep into rectifiers: Surpassing human-level perfor-
mance on imagenet classification. IEEE International
Conference on Computer Vision (ICCV 2015), 1502.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Spatial pyra-
mid pooling in deep convolutional networks for visual
recognition. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 37(9):1904–1916.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Identity
Mappings in Deep Residual Networks. arXiv e-prints,
page arXiv:1603.05027.
Lee, N., Ajanthan, T., and Torr, P. H. S. (2018). SNIP:
Single-shot Network Pruning based on Connection
Sensitivity. arXiv e-prints, page arXiv:1810.02340.
Li, X., Zhao, H., Han, L., Tong, Y., and Yang, K. (2019a).
GFF: Gated Fully Fusion for Semantic Segmentation.
arXiv e-prints, page arXiv:1904.01803.
Li, Y., Chen, Y., Wang, N., and Zhang, Z. (2019b). Scale-
Aware Trident Networks for Object Detection. arXiv
e-prints, page arXiv:1901.01892.
Lin, T.-Y., Dollr, P., Girshick, R. B., He, K., Hariharan, B.,
and Belongie, S. J. (2017). Feature pyramid networks
for object detection. In CVPR, pages 936–944. IEEE
Computer Society.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollar, P.
(2018). Focal loss for dense object detection. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, PP:1–1.
Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018). Path
aggregation network for instance segmentation. In
CVPR, pages 8759–8768. IEEE Computer Society.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu,
C.-Y., and Berg, A. (2016). Ssd: Single shot multibox
detector. volume 9905, pages 21–37.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
The IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR).
Misra, I., Shrivastava, A., Gupta, A., and Hebert, M. (2016).
Cross-stitch networks for multi-task learning. CoRR,
abs/1604.03539.
Redmon, J. and Farhadi, A. (2017). Yolo9000: Better,
faster, stronger. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Ren, S., He, K., Girshick, R. B., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. In Cortes, C., Lawrence, N. D.,
Lee, D. D., Sugiyama, M., and Garnett, R., editors,
NIPS, pages 91–99.
Shrivastava, A., Sukthankar, R., Malik, J., and Gupta, A.
(2016). Beyond Skip Connections: Top-Down Mod-
ulation for Object Detection. arXiv e-prints, page
arXiv:1612.06851.
Simonyan, K. and Zisserman, A. (2014). Very Deep Con-
volutional Networks for Large-Scale Image Recogni-
tion. arXiv e-prints, page arXiv:1409.1556.
Sistu, G., Leang, I., and Yogamani, S. (2019). Real-time
Joint Object Detection and Semantic Segmentation
Network for Automated Driving. arXiv e-prints, page
arXiv:1901.03912.
Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., and
Urtasun, R. (2016). MultiNet: Real-time Joint Se-
mantic Reasoning for Autonomous Driving. arXiv e-
prints, page arXiv:1612.07695.
Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., and
Smeulders, A. W. M. (2013). Selective search for ob-
ject recognition. International Journal of Computer
Vision, 104(2):154–171.
Valada, A., Mohan, R., and Burgard, W. (2019). Self-
supervised model adaptation for multimodal seman-
tic segmentation. International Journal of Computer
Vision.
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,
Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., and
Xiao, B. (2019). Deep High-Resolution Representa-
tion Learning for Visual Recognition. arXiv e-prints,
page arXiv:1908.07919.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017). Pyra-
mid scene parsing network. In The IEEE Conference
on Computer Vision and Pattern Recognition (CVPR).
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L.,
and Ling, H. (2019). M2det: A single-shot object de-
tector based on multi-level feature pyramid network.
Proceedings of the AAAI Conference on Artificial In-
telligence, 33:9259–9266.