
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers,
D., Reid, I., Roth, S., Schindler, K., and Leal-Taix
´
e, L.
(2020). Mot20: A benchmark for multi object tracking
in crowded scenes. arXiv preprint arXiv:2003.09003.
Dolokov, A., Andresen, N., Hohlbaum, K., Th
¨
one-Reineke,
C., Lewejohann, L., and Hellwich, O. (2023). Upper
bound tracker: A multi-animal tracking solution for
closed laboratory settings. In VISIGRAPP (5: VIS-
APP), pages 945–952.
Du, Y., Zhao, Z., Song, Y., Zhao, Y., Su, F., Gong, T., and
Meng, H. (2023). Strongsort: Make deepsort great
again. IEEE Transactions on Multimedia.
Fan, H., Zhao, T., Wang, Q., Fan, B., Tang, Y., and Liu,
L. (2024). Gmt: A robust global association model
for multi-target multi-camera tracking. arXiv preprint
arXiv:2407.01007.
Guggenberger, M. (2023). Multimodal Alignment of Videos.
Doctoral dissertation, Alpen-Adria-Universit
¨
at Kla-
genfurt, Klagenfurt am W
¨
orthersee. Toward Mul-
timodal Synchronization of User-Generated Event
Recordings.
Huang, H.-W., Yang, C.-Y., Ramkumar, S., Huang, C.-
I., Hwang, J.-N., Kim, P.-K., Lee, K., and Kim,
K. (2023). Observation centric and central distance
recovery for athlete tracking. In Proceedings of
the IEEE/CVF Winter Conference on Applications of
Computer Vision, pages 454–460.
Huang, H.-W., Yang, C.-Y., Sun, J., Kim, P.-K., Kim, K.-J.,
Lee, K., Huang, C.-I., and Hwang, J.-N. (2024). Itera-
tive scale-up expansioniou and deep features associa-
tion for multi-object tracking in sports. In Proceedings
of the IEEE/CVF Winter Conference on Applications
of Computer Vision, pages 163–172.
Huang, Y.-C., Liao, I.-N., Chen, C.-H.,
˙
Ik, T.-U., and Peng,
W.-C. (2019). Tracknet: A deep learning network
for tracking high-speed and tiny objects in sports ap-
plications. In 2019 16th IEEE International Confer-
ence on Advanced Video and Signal Based Surveil-
lance (AVSS), pages 1–8. IEEE.
Hussain, M. (2024). Yolov1 to v8: Unveiling each variant–
a comprehensive review of yolo. IEEE Access,
12:42816–42833.
Ishikawa, H., Hayashi, M., Phan, T. H., Yamamoto, K., Ma-
suda, M., and Aoki, Y. (2021). Analysis of recent re-
identification architectures for tracking-by-detection
paradigm in multi-object tracking. In VISIGRAPP (5:
VISAPP), pages 234–244.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Com-
puter Vision–ECCV 2014: 13th European Confer-
ence, Zurich, Switzerland, September 6-12, 2014, Pro-
ceedings, Part V 13, pages 740–755. Springer.
Meinhardt, T., Kirillov, A., Leal-Taixe, L., and Feichten-
hofer, C. (2022). Trackformer: Multi-object tracking
with transformers. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 8844–8854.
Pareek, N. (2022). Using deepsort object tracker
with yolov5. Kaggle. [Online]. Available:
https://www.kaggle.com/code/nityampareek/
using-deepsort-object-tracker-with-yolov5. Ac-
cessed: Dec. 10, 2024.
Pessoa, L., Alencar, E., Costa, F., Souza, G., and Freitas,
R. (2024). Exploring multi-camera views from user-
generated sports videos. In Anais do XII Symposium
on Knowledge Discovery, Mining and Learning, pages
105–112, Porto Alegre, RS, Brasil. SBC.
Rangasamy, K., As’ari, M. A., Rahmad, N. A., Ghaz-
ali, N. F., and Ismail, S. (2020). Deep learning
in sport video analysis: a review. TELKOMNIKA
(Telecommunication Computing Electronics and Con-
trol), 18(4):1926–1933.
Russell, S. and Norvig, P. (2020). Artificial Intelligence: A
Modern Approach. 4th edition.
Tang, Y., Bi, J., Xu, S., Song, L., Liang, S., Wang, T.,
Zhang, D., An, J., Lin, J., Zhu, R., et al. (2023). Video
understanding with large language models: A survey.
arXiv preprint arXiv:2312.17432.
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2022).
YOLOv7: Trainable bag-of-freebies sets new state-of-
the-art for real-time object detectors. arXiv preprint
arXiv:2207.02696.
Wang, J., Chen, D., Luo, C., He, B., Yuan, L., Wu, Z., and
Jiang, Y.-G. (2024). Omnivid: A generative frame-
work for universal video understanding. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 18209–18220.
Wang, Y., Huang, Q., Jiang, C., Liu, J., Shang, M., and
Miao, Z. (2023). Video stabilization: A comprehen-
sive survey. Neurocomputing, 516:205–230.
Whitehead, A., Laganiere, R., and Bose, P. (2005). Tempo-
ral synchronization of video sequences in theory and
in practice. In 2005 Seventh IEEE Workshops on Ap-
plications of Computer Vision (WACV/MOTION’05)-
Volume 1, volume 2, pages 132–137. IEEE.
Wojke, N., Bewley, A., and Paulus, D. (2017). Simple on-
line and realtime tracking with a deep association met-
ric. In 2017 IEEE International Conference on Image
Processing (ICIP), pages 3645–3649. IEEE.
Zhao, Z., Chai, W., Hao, S., Hu, W., Wang, G., Cao, S.,
Song, M., Hwang, J.-N., and Wang, G. (2023). A
survey of deep learning in sports applications: Per-
ception, comprehension, and decision. arXiv preprint
arXiv:2307.03353.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
698