
based architecture for motion prediction in soccer. In
ICIP, pages 2313–2319.
Capellera, G., Ferraz, L., Rubio, A., Agudo, A., and
Moreno-Noguer, F. (2024b). Transportmer: A holis-
tic approach to trajectory understanding in multi-agent
sports. In ACCV.
Cioppa, A., Giancola, S., Somers, V., Magera, F., Zhou,
X., Mkhallati, H., Deli
`
ege, A., Held, J., Hinojosa, C.,
Mansourian, A. M., et al. (2023). Soccernet 2023
challenges results. arXiv preprint arXiv:2309.06006.
Cossich, V. R., Carlgren, D., Holash, R. J., and Katz, L.
(2023). Technological breakthroughs in sport: Current
practice and future potential of artificial intelligence,
virtual reality, augmented reality, and modern data vi-
sualization in performance analysis. Applied Sciences,
13(23):12965.
Deliege, A., Cioppa, A., Giancola, S., Seikavandi, M. J.,
Dueholm, J. V., Nasrollahi, K., Ghanem, B., Moes-
lund, T. B., and Van Droogenbroeck, M. (2021).
Soccernet-v2: A dataset and benchmarks for holistic
understanding of broadcast soccer videos. In CVPR,
pages 4508–4519.
Deli
`
ege, A., Cioppa, A., Giancola, S., Seikavandi, M. J.,
Dueholm, J. V., Nasrollahi, K., Ghanem, B., Moes-
lund, T. B., and Droogenbroeck, M. V. (2023). Ball
action data and labels for soccernet ball action spot-
ting challenge. https://www.soccer-net.org/data#h.yk
gf675j127d.
Deloitte (2023). Annual review of football finance 2023.
Deng, J., Guo, J., Xue, N., and Zafeiriou, S. (2019). Ar-
cface: Additive angular margin loss for deep face
recognition. In CVPR, pages 4690–4699.
Denize, J., Liashuha, M., Rabarisoa, J., Orcesi, A., and
H
´
erault, R. (2024). Comedian: Self-supervised learn-
ing and knowledge distillation for action spotting us-
ing transformers. In WACV, pages 530–540.
Giancola, S., Amine, M., Dghaily, T., and Ghanem, B.
(2018). Soccernet: A scalable dataset for action spot-
ting in soccer videos. In CVPRW, pages 1711–1721.
Giancola, S. and Ghanem, B. (2021). Temporally-aware
feature pooling for action spotting in soccer broad-
casts. In CVPR, pages 4490–4499.
Goes, F., Meerhoff, L., Bueno, M., Rodrigues, D., Moura,
F., Brink, M., Elferink-Gemser, M., Knobbe, A.,
Cunha, S., Torres, R., et al. (2021). Unlocking the
potential of big data to support tactical performance
analysis in professional soccer: A systematic review.
European Journal of Sport Science, 21(4):481–496.
Guti
´
errez-P
´
erez, M. and Agudo, A. (2024a). No bells just
whistles: Sports field registration by leveraging geo-
metric properties. In CVPRW.
Guti
´
errez-P
´
erez, M. and Agudo, A. (2024b). Pnlcalib:
Sports field registration via points and lines optimiza-
tion. SSRN.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In CVPR, pages
770–778.
Honda, Y., Kawakami, R., Yoshihashi, R., Kato, K., and
Naemura, T. (2022). Pass receiver prediction in soccer
using video and players’ trajectories. In CVPR, pages
3503–3512.
Kim, H., Choi, H.-J., Kim, C. J., Yoon, J., and Ko, S.-
K. (2023). Ball trajectory inference from multi-agent
sports contexts using set transformer and hierarchical
bi-lstm. In ACM SIGKDD, pages 4296–4307.
Liga, D. F. (2020). Positional tracking takes a big leap for-
ward as latest generation is installed at bundesliga and
bundesliga 2 stadiums. https://www.dfl.de/en/innova
tion/positional-tracking-takes-a-big-leap-forward-a
s-latest-generation-is-installed-at-bundesliga-and-b
undesliga-2-stadiums/.
Lin, J., Gan, C., and Han, S. (2019). Tsm: Temporal shift
module for efficient video understanding. In ICCV,
pages 7083–7093.
Luo, Y. and Mesgarani, N. (2019). Conv-tasnet: Surpassing
ideal time–frequency magnitude masking for speech
separation. T AUDIO SPE, 27(8):1256–1266.
Philipp Singer, Yauhen Babakhin, P. P. (2022). Winning
solution for bundesliga data shootout. https://www.ka
ggle.com/competitions/dfl-bundesliga-data-shootout/
discussion/359932.
Sanford, R., Gorji, S., Hafemann, L. G., Pourbabaee, B.,
and Javan, M. (2020). Group activity detection from
trajectory and video data in soccer. In CVPRW, pages
898–899.
Simpson, I., Beal, R. J., Locke, D., and Norman, T. J.
(2022). Seq2event: Learning the language of soccer
using transformer-based match event prediction. In
ACM SIGKDD, pages 3898–3908.
Sorano, D., Carrara, F., Cintia, P., Falchi, F., and Pap-
palardo, L. (2021). Automatic pass annotation from
soccer video streams based on object detection and
lstm. In ECML, pages 475–490.
Sudhakaran, S., Escalera, S., and Lanz, O. (2020). Gate-
shift networks for video action recognition. In CVPR,
pages 1102–1111.
Vidal-Codina, F., Evans, N., El Fakir, B., and Billingham,
J. (2022). Automatic event detection in football using
tracking data. Sports Engineering, 25(1):18.
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and
Lang, K. J. (2013). Phoneme recognition using time-
delay neural networks. In Backpropagation, pages
35–61. Psychology Press.
Yamamoto, D. (2022). Third place solution for bundesliga
data shootout. https://www.kaggle.com/competitions/
dfl-bundesliga-data-shootout/discussion/360236.
Yeung, C. C., Sit, T., and Fujii, K. (2023). Transformer-
based neural marked spatio temporal point process
model for football match events analysis. arXiv
preprint arXiv:2302.09276.
Yu, G. and Yuan, J. (2015). Fast action proposals for human
action detection and search. In CVPR, pages 1302–
1311.
Zhou, X., Kang, L., Cheng, Z., He, B., and Xin, J. (2021).
Feature combination meets attention: Baidu soccer
embeddings and transformer based temporal detec-
tion. arXiv preprint arXiv:2106.14447.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
230