Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Lin, J., Gan, C., and Han, S. (2019). Tsm: Temporal shift
module for efficient video understanding. In Proceed-
ings of the IEEE/CVF International Conference on
Computer Vision (ICCV).
Miao, J., Wei, Y., Wu, Y., Liang, C., Li, G., and Yang, Y.
(2021). Vspw: A large-scale dataset for video scene
parsing in the wild. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Miksik, O., Vineet, V., Lidegaard, M., Prasaath, R.,
Nießner, M., Golodetz, S., Hicks, S. L., P
´
erez, P.,
Izadi, S., and Torr, P. H. (2015). The semantic paint-
brush: Interactive 3d mapping and recognition in large
outdoor spaces. In Proceedings of the 33rd Annual
ACM Conference on Human Factors in Computing
Systems, CHI ’15, page 3317–3326, New York, NY,
USA. Association for Computing Machinery.
Nilsson, D. and Sminchisescu, C. (2018). Semantic video
segmentation by gated recurrent flow propagation. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Oh, S. W., Lee, J.-Y., Xu, N., and Kim, S. J. (2019). Video
object segmentation using space-time memory net-
works. In Proceedings of the IEEE/CVF International
Conference on Computer Vision (ICCV).
Qiu, Z., Yao, T., and Mei, T. (2018). Learning deep spatio-
temporal dependence for semantic video segmenta-
tion. IEEE Transactions on Multimedia, 20(4):939–
949.
Siam, M., Valipour, S., J
¨
agersand, M., and Ray, N. (2016).
Convolutional gated recurrent networks for video seg-
mentation. CoRR, abs/1611.05435.
Siam, M., Valipour, S., Jagersand, M., and Ray, N. (2017).
Convolutional gated recurrent networks for video seg-
mentation. In 2017 IEEE International Conference on
Image Processing (ICIP), pages 3090–3094.
Teichmann, M., Weber, M., Zoellner, M., Cipolla, R., and
Urtasun, R. (2018). Multinet: Real-time joint seman-
tic reasoning for autonomous driving.
Vineet, V., Miksik, O., Lidegaard, M., Nießner, M.,
Golodetz, S., Prisacariu, V. A., K
¨
ahler, O., Murray,
D. W., Izadi, S., P
´
erez, P., and Torr, P. H. S. (2015).
Incremental dense semantic stereo fusion for large-
scale semantic scene reconstruction. In 2015 IEEE
International Conference on Robotics and Automation
(ICRA), pages 75–82.
Wang, H., Jiang, X., Ren, H., Hu, Y., and Bai, S.
(2021a). Swiftnet: Real-time video object segmen-
tation. CoRR, abs/2102.04604.
Wang, H., Wang, W., and Liu, J. (2021b). Temporal mem-
ory attention for video semantic segmentation. CoRR,
abs/2102.08643.
Woo, S., Park, J., Lee, J.-Y., and Kweon, I. S. (2018). Cbam:
Convolutional block attention module. In Proceed-
ings of the European Conference on Computer Vision
(ECCV).
Xu, Y.-S., Fu, T.-J., Yang, H.-K., and Lee, C.-Y. (2018). Dy-
namic video segmentation network. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Yin, M., Yao, Z., Cao, Y., Li, X., Zhang, Z., Lin, S., and Hu,
H. (2020). Disentangled non-local neural networks. In
Vedaldi, A., Bischof, H., Brox, T., and Frahm, J.-M.,
editors, Computer Vision – ECCV 2020, pages 191–
207, Cham. Springer International Publishing.
Zhang, H., Geiger, A., and Urtasun, R. (2013). Understand-
ing high-level semantics by modeling traffic patterns.
In Proceedings of the IEEE International Conference
on Computer Vision (ICCV).
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).
Pyramid scene parsing network. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Zhu, X., Xiong, Y., Dai, J., Yuan, L., and Wei, Y. (2017).
Deep feature flow for video recognition. In Proceed-
ings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR).
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
86