REFERENCES
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2017).
Segnet: A deep convolutional encoder-decoder ar-
chitecture for image segmentation. IEEE transac-
tions on pattern analysis and machine intelligence,
39(12):2481–2495.
Chen, L.-C., Papandreou, G., Schroff, F., and Adam,
H. (2017). Rethinking atrous convolution for
semantic image segmentation. arXiv preprint
arXiv:1706.05587.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018). Encoder-decoder with atrous sepa-
rable convolution for semantic image segmentation. In
Proceedings of the European conference on computer
vision (ECCV), pages 801–818.
Chen, Y., Dapogny, A., and Cord, M. (2020). Semeda:
Enhancing segmentation precision with semantic edge
aware loss. Pattern Recognition, 108:107557.
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler,
M., Benenson, R., Franke, U., Roth, S., and Schiele,
B. (2016). The cityscapes dataset for semantic urban
scene understanding. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 3213–3223.
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., and Lu,
H. (2019). Dual attention network for scene segmen-
tation. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
3146–3154.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-
excitation networks. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 7132–7141.
Huang, Q., Xia, C., Wu, C., Li, S., Wang, Y., Song, Y.,
and Kuo, C.-C. J. (2017). Semantic segmentation with
reverse attention. arXiv preprint arXiv:1707.06426.
Li, H., Xiong, P., An, J., and Wang, L. (2018). Pyramid
attention network for semantic segmentation. arXiv
preprint arXiv:1805.10180.
Li, X., Zhong, Z., Wu, J., Yang, Y., Lin, Z., and Liu,
H. (2019). Expectation-maximization attention net-
works for semantic segmentation. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision, pages 9167–9176.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 3431–3440.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. In International Conference on Medical
image computing and computer-assisted intervention,
pages 234–241. Springer.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Smith, L. N. (2017). Cyclical learning rates for training
neural networks. In 2017 IEEE winter conference on
applications of computer vision (WACV), pages 464–
472. IEEE.
Takikawa, T., Acuna, D., Jampani, V., and Fidler, S. (2019).
Gated-scnn: Gated shape cnns for semantic segmen-
tation. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, pages 5229–5238.
Tao, A., Sapra, K., and Catanzaro, B. (2020). Hierarchical
multi-scale attention for semantic segmentation. arXiv
preprint arXiv:2005.10821.
Valada, A., Mohan, R., and Burgard, W. (2019). Self-
supervised model adaptation for multimodal seman-
tic segmentation. International Journal of Computer
Vision, pages 1–47.
Valada, A., Oliveira, G., Brox, T., and Burgard, W. (2016).
Deep multispectral semantic scene understanding of
forested environments using multimodal fusion. In
International Symposium on Experimental Robotics
(ISER).
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need. arXiv preprint
arXiv:1706.03762.
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao,
Y., Liu, D., Mu, Y., Tan, M., Wang, X., et al. (2020).
Deep high-resolution representation learning for vi-
sual recognition. IEEE transactions on pattern analy-
sis and machine intelligence.
Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. (2018). Cbam:
Convolutional block attention module. In Proceed-
ings of the European conference on computer vision
(ECCV), pages 3–19.
Yuan, Y., Xie, J., Chen, X., and Wang, J. (2020). Seg-
fix: Model-agnostic boundary refinement for segmen-
tation. In European Conference on Computer Vision,
pages 489–506. Springer.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).
Pyramid scene parsing network. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 2881–2890.
Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang,
Y., Fu, Y., Feng, J., Xiang, T., Torr, P. H., et al.
(2020). Rethinking semantic segmentation from a
sequence-to-sequence perspective with transformers.
arXiv preprint arXiv:2012.15840.
Zhong, Z., Lin, Z. Q., Bidart, R., Hu, X., Daya, I. B., Li, Z.,
Zheng, W.-S., Li, J., and Wong, A. (2020). Squeeze-
and-attention networks for semantic segmentation. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 13065–
13074.
A General Two-branch Decoder Architecture for Improving Encoder-decoder Image Segmentation Models
381