Comparative Analysis of Deep Learning-Based Multi-Object Tracking Approaches Applied to Sports User-Generated Videos
Elton Alencar, Larissa Pessoa, Fernanda Costa, Guilherme Souza, Rosiane de Freitas
2025
Abstract
The growth of video-sharing platforms has led to a significant increase in audiovisual content production, especially from mobile devices like smartphones. Sports user-generated videos (UGVs) pose unique challenges for automated analysis due to variations in image quality, diverse camera angles, and fast-moving objects. This paper presents a comparative qualitative analysis of multiple object tracking (MOT) techniques applied to sports UGVs. We evaluated three approaches: DeepSORT, StrongSORT, and TrackFormer, representing detection and attention-based tracking paradigms. Additionally, we propose integrating StrongSORT with YOLO-World, an open-vocabulary detector, to improve tracking by reducing irrelevant object detection and focusing on key elements such as players and balls. To assess the techniques, we developed UVY, a custom sports UGV database, having YouTube as its data source. A qualitative analysis of the results from applying the different tracking methods to UVY-Track videos revealed that tracking-by-detection techniques, DeepSORT and StrongSORT, performed better at tracking relevant sports objects than TrackFormer, which focus on pedestrians. The new StrongSORT version with YOLO-World showed promise by detecting fewer irrelevant objects. These findings suggest that integrating open-vocabulary detectors into MOT models can significantly improve sports UGV analysis. This work contributes to developing more effective and scalable solutions for object tracking in sports videos.
DownloadPaper Citation
in Harvard Style
Alencar E., Pessoa L., Costa F., Souza G. and de Freitas R. (2025). Comparative Analysis of Deep Learning-Based Multi-Object Tracking Approaches Applied to Sports User-Generated Videos. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 691-698. DOI: 10.5220/0013185700003912
in Bibtex Style
@conference{visapp25,
author={Elton Alencar and Larissa Pessoa and Fernanda Costa and Guilherme Souza and Rosiane de Freitas},
title={Comparative Analysis of Deep Learning-Based Multi-Object Tracking Approaches Applied to Sports User-Generated Videos},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={691-698},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013185700003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Comparative Analysis of Deep Learning-Based Multi-Object Tracking Approaches Applied to Sports User-Generated Videos
SN - 978-989-758-728-3
AU - Alencar E.
AU - Pessoa L.
AU - Costa F.
AU - Souza G.
AU - de Freitas R.
PY - 2025
SP - 691
EP - 698
DO - 10.5220/0013185700003912
PB - SciTePress