loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Takane Kumakura ; Ryohei Orihara ; Yasuyuki Tahara ; Akihiko Ohsuga and Yuichi Sei

Affiliation: The University of Electro-Communications, Graduate School of Informatics and Engineering Departments, Department of Informatics 1-5-1 Chofugaoka, Chofu, Japan

Keyword(s): Action Spotting, Multimodal Learning, Transformer, Markov Chain, Soccer, Football, Live Broadcasting, Deep Learning, Machine Learning, Artificial Intelligence.

Abstract: This study proposes ASPERA (Action SPotting thrEe-modal Recognition Architecture), a multimodal football action recognition method based on the ASTRA architecture that incorporates video, audio, and commentary text information. ASPERA showed higher accuracy than models using video and audio only, excluding invisible actions in the video. This result demonstrates the advantage of this multimodal approach. Additionally, we propose three advanced models: ASPERAsrnd incorporating surrounding commentary text within a ±20-second range, ASPERAcln removing irrelevant background information, and ASPERAMC applying a Markov head to provide prior knowledge of football action flow. ASPERAsrnd and ASPERAcln, which refine the text embedding, enhanced the ability to accurately identify the timing of actions. Notably, ASPERAMC with the Markov head demonstrated the highest accuracy for invisible actions in the football video. ASPERAsrnd and ASPERAcln not only demonstrate the utility of text informatio n in football action spotting but also highlight key factors that enhance this effect, such as incorporating surrounding commentary text and removing background information. Finally, ASPERAMC shows the effectiveness of combining Transformer models and Markov chains for recognizing actions in invisible scenes. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.221.161.189

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Kumakura, T., Orihara, R., Tahara, Y., Ohsuga, A. and Sei, Y. (2025). ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 646-657. DOI: 10.5220/0013300700003890

@conference{icaart25,
author={Takane Kumakura and Ryohei Orihara and Yasuyuki Tahara and Akihiko Ohsuga and Yuichi Sei},
title={ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={646-657},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013300700003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary
SN - 978-989-758-737-5
IS - 2184-433X
AU - Kumakura, T.
AU - Orihara, R.
AU - Tahara, Y.
AU - Ohsuga, A.
AU - Sei, Y.
PY - 2025
SP - 646
EP - 657
DO - 10.5220/0013300700003890
PB - SciTePress