
jbom, O. (2020). nuscenes: A multimodal dataset for
autonomous driving. In CVPR.
Cioppa, A., Deli
`
ege, A., Giancola, S., Ghanem, B., and
Van Droogenbroeck, M. (2022a). Scaling up Soc-
cerNet with multi-view spatial localization and re-
identification. 9(1):1–9.
Cioppa, A., Giancola, S., Deliege, A., Kang, L., Zhou, X.,
Cheng, Z., Ghanem, B., and Van Droogenbroeck, M.
(2022b). SoccerNet-tracking: Multiple object track-
ing dataset and benchmark in soccer videos. pages
3490–3501.
Cioppa, A., Giancola, S., Somers, V., and et al. SoccerNet
2023 challenges results.
Cioppa, A., Giancola, S., Somers, V., and et al. Soccernet
2024 challenges results.
Contributors, M. (2020). Openmmlab pose estimation tool-
box and benchmark. https://github.com/open-mmlab/
mmpose.
Deli
`
ege, A., Cioppa, A., Giancola, S., Seikavandi, M. J.,
Dueholm, J. V., Nasrollahi, K., Ghanem, B., Moes-
lund, T. B., and Van Droogenbroeck, M. (2021).
SoccerNet-v2: A dataset and benchmarks for holis-
tic understanding of broadcast soccer videos. pages
4503–4514.
Fabbri, M., Lanzi, F., Calderara, S., Palazzi, A., Vezzani, R.,
and Cucchiara, R. (2018). Learning to detect and track
visible and occluded body joints in a virtual world. In
European Conference on Computer Vision (ECCV).
Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox:
Exceeding yolo series in 2021.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for autonomous driving? the kitti vision benchmark
suite. In Conference on Computer Vision and Pattern
Recognition (CVPR).
Giancola, S., Amine, M., Dghaily, T., and Ghanem, B.
(2018). SoccerNet: A scalable dataset for action spot-
ting in soccer videos. pages 1792–179210.
Giancola, S., Cioppa, A., Deli
`
ege, A., and et al. (2022).
SoccerNet 2022 challenges results. pages 75–86.
ACM.
Held, J., Cioppa, A., Giancola, S., Hamdi, A., Ghanem, B.,
and Van Droogenbroeck, M. (2023). VARS: Video
assistant referee system for automated soccer decision
making from multiple views. pages 5086–5097.
Ionescu, C., Papava, D., Olaru, V., and Sminchisescu, C.
(2014). Human3.6m: Large scale datasets and pre-
dictive methods for 3d human sensing in natural envi-
ronments. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 36(7):1325–1339.
Leduc, A., Cioppa, A., Giancola, S., Ghanem, B., and
Van Droogenbroeck, M. (2024). Soccernet-depth: a
scalable dataset for monocular depth estimation in
sports videos. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR) Workshops, pages 3280–3292.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2015a). Microsoft coco: Common ob-
jects in context.
Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick,
R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L.,
and Doll
´
ar, P. (2015b). Microsoft coco: Common ob-
jects in context.
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., and
Black, M. J. (2015). SMPL: A skinned multi-person
linear model. ACM Trans. Graphics (Proc. SIG-
GRAPH Asia), 34(6):248:1–248:16.
Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A.,
Leal-Taix
´
e, L., and Leibe, B. (2020). HOTA: A
higher order metric for evaluating multi-object track-
ing. 129(2):548–578.
Maji, D., Nagori, S., Mathew, M., and Poddar, D. (2022).
Yolo-pose: Enhancing yolo for multi person pose es-
timation using object keypoint similarity loss. pages
2636–2645.
Mehta, D., Rhodin, H., Casas, D., Fua, P., Sotnychenko, O.,
Xu, W., and Theobalt, C. (2017). Monocular 3d hu-
man pose estimation in the wild using improved cnn
supervision. In 3D Vision (3DV), 2017 Fifth Interna-
tional Conference on. IEEE.
Menze, M. and Geiger, A. (2015). Object scene flow for
autonomous vehicles. In Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Mkhallati, H., Cioppa, A., Giancola, S., Ghanem, B., and
Van Droogenbroeck, M. (2023). SoccerNet-caption:
Dense video captioning for soccer broadcasts com-
mentaries. pages 5074–5085.
Rematas, K., Kemelmacher-Shlizerman, I., Curless, B., and
Seitz, S. (2018). Soccer on your tabletop. In CVPR.
Somers, V., Joos, V., Giancola, S., Cioppa, A.,
Ghasemzadeh, S. A., Magera, F., Standaert, B., Man-
sourian, A. M., Zhou, X., Kasaei, S., Ghanem,
B., Alahi, A., Van Droogenbroeck, M., and
De Vleeschouwer, C. (2024). SoccerNet game state
reconstruction: End-to-end athlete tracking and iden-
tification on a minimap.
Trioptics. Imagemaster.
https://www.trioptics.com/products/imagemaster-
hr-tempcontrol-universal-image-quality-mtf-testing/.
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J.,
and Ding, G. (2024). Yolov10: Real-time end-to-end
object detection. arXiv preprint arXiv:2405.14458.
Yang, Z., Cai, Z., Mei, H., Liu, S., Chen, Z., Xiao, W.,
Wei, Y., Qing, Z., Wei, C., Dai, B., Wu, W., Qian,
C., Lin, D., Liu, Z., and Yang, L. (2023). Synbody:
Synthetic dataset with layered human models for 3d
human perception and modeling. In Proceedings of
the IEEE/CVF International Conference on Computer
Vision (ICCV), pages 20282–20292.
Zhu, L., Rematas, K., Curless, B., Seitz, S., and
Kemelmacher-Shlizerman, I. (2020). Reconstructing
nba players. In Proceedings of the European Confer-
ence on Computer Vision (ECCV).
Spiideo SoccerNet SynLoc: Single Frame World Coordinate Athlete Detection and Localization with Synthetic Data
285