
Lin, H., Wu, R., Liu, S., Lu, J., and Jia, J. (2021). Video in-
stance segmentation with a propose-reduce paradigm.
In Proceedings of the IEEE/CVF International Con-
ference on Computer Vision, pages 1739–1748.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2014). Microsoft coco: Common ob-
jects in context. In Proceedings of the CVF European
Conference on Computer Vision.
Liu, W., Shen, X., Li, H., Bi, X., Liu, B., Pun, C.-M., and
Cun, X. (2024). Depth-aware test-time training for
zero-shot video object segmentation. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 19218–19227.
Liu, Z., Liu, J., Chen, W., Wu, X., and Li, Z. (2021).
Faminet: Learning real-time semisupervised video
object segmentation with steepest optimized optical
flow. IEEE Transactions on Instrumentation and Mea-
surement, 71:1–16.
Maninis, K.-K., Caelles, S., Chen, Y., Pont-Tuset, J., Leal-
Taix
´
e, L., Cremers, D., and Gool, L. V. (2018). Video
object segmentation without temporal information.
IEEE Transactions on Pattern Analysis and Machine
Intelligence.
Oh, S. W., Lee, J.-Y., Sunkavalli, K., and Kim, S. J. (2018).
Fast video object segmentation by reference-guided
mask propagation. In Proceedings of the IEEE/CVF
International Conference on Computer Vision.
Oh, S. W., Lee, J.-Y., Xu, N., and Kim, S. J. (2019). Video
object segmentation using space-time memory net-
works. In Proceedings of the IEEE/CVF International
Conference on Computer Vision.
Paul, V. and Leibe, B. (2017). Online adaptation of convolu-
tional neural networks for video object segmentation.
In Proceedings of the British Machine Vision Confer-
ence.
Perazzi, F., Pont-Tuset, J., McWilliams, B., Van Gool, L.,
Gross, M., and Sorkine-Hornung, A. (2016). A bench-
mark dataset and evaluation methodology for video
object segmentation. In Computer Vision and Pattern
Recognition.
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbel
´
aez, P.,
Sorkine-Hornung, A., and Gool, L. V. (2017). The
2017 davis challenge on video object segmentation.
In arXiv preprint arXiv:1704.00675.
Ravi, N., Gabeur, V., Hu, Y.-T., Hu, R., Ryali, C., Ma,
T., Khedr, H., R
¨
adle, R., Rolland, C., Gustafson, L.,
et al. (2024). Sam 2: Segment anything in images and
videos. arXiv preprint arXiv:2408.00714.
Robinson, A., Lawin, F. J., Danelljan, M., Khan, F. S., and
Felsberg, M. (2020). Learning fast and robust target
models for video object segmentation. In Proceedings
of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition.
Su, T., Song, H., Liu, D., Liu, B., and Liu, Q. (2023). Un-
supervised video object segmentation with online ad-
versarial self-tuning. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages
688–698.
Tim, M. and Leal-Taix
´
e, L. (2020). Make one-shot video
object segmentation efficient again. In Proceedings of
Advances in Neural Information Processing Systems.
Wang, H., Jiang, X., Ren, H., Hu, Y., and Bai, S. (2021).
Swiftnet: Real-time video object segmentation. In
Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition.
Wang, H., Yan, C., Wang, S., Jiang, X., Tang, X., Hu,
Y., Xie, W., and Gavves, E. (2023a). Towards open-
vocabulary video instance segmentation. In Proceed-
ings of the IEEE/CVF International Conference on
Computer Vision, pages 4057–4066.
Wang, J., Chen, D., Wu, Z., Luo, C., Tang, C., Dai, X.,
Zhao, Y., Xie, Y., Yuan, L., and Jiang, Y.-G. (2023b).
Look before you match: Instance understanding mat-
ters in video object segmentation. In Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition, pages 2268–2278.
Xu, N., Yang, L., Fan, Y., Yang, J., Yue, D., Liang, Y.,
Price, B., Cohen, S., and Huang, T. (2018). Youtube-
vos: Sequence-to-sequence video object segmenta-
tion. In Proceedings of the CVF European Conference
on Computer Vision.
Yang, L., Fan, Y., and Xu, N. (2019). Video instance seg-
mentation. In Proceedings of the IEEE/CVF inter-
national conference on computer vision, pages 5188–
5197.
Yang, Z. and Yang, Y. (2022). Decoupling features in hi-
erarchical propagation for video object segmentation.
Advances in Neural Information Processing Systems,
35:36324–36336.
Yuxi, L., Ning, X., Jinlong, P., John, S., and Weiyao, L.
(2020). Delving into the cyclic mechanism in semi-
supervised video object segmentation. In Proceedings
of Advances in Neural Information Processing Sys-
tems.
Zhou, T., Li, J., Li, X., and Shao, L. (2021). Target-
aware object discovery and association for unsuper-
vised video multi-object segmentation. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition.
Zhou, T., Porikli, F., Crandall, D. J., Van Gool, L.,
and Wang, W. (2022). A survey on deep learning
technique for video segmentation. IEEE transac-
tions on pattern analysis and machine intelligence,
45(6):7099–7122.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
38