ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary

Takane Kumakura; Ryohei Orihara; Yasuyuki Tahara; Akihiko Ohsuga; Yuichi Sei

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary

Topics: Deep Learning; Machine Learning

In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, 646-657, 2025 , Porto, Portugal

Authors: Takane Kumakura ; Ryohei Orihara ; Yasuyuki Tahara ; Akihiko Ohsuga and Yuichi Sei

Affiliation: The University of Electro-Communications, Graduate School of Informatics and Engineering Departments, Department of Informatics 1-5-1 Chofugaoka, Chofu, Japan

Keyword(s): Action Spotting, Multimodal Learning, Transformer, Markov Chain, Soccer, Football, Live Broadcasting, Deep Learning, Machine Learning, Artificial Intelligence.

Abstract: This study proposes ASPERA (Action SPotting thrEe-modal Recognition Architecture), a multimodal football action recognition method based on the ASTRA architecture that incorporates video, audio, and commentary text information. ASPERA showed higher accuracy than models using video and audio only, excluding invisible actions in the video. This result demonstrates the advantage of this multimodal approach. Additionally, we propose three advanced models: ASPERAsrnd incorporating surrounding commentary text within a ±20-second range, ASPERAcln removing irrelevant background information, and ASPERAMC applying a Markov head to provide prior knowledge of football action flow. ASPERAsrnd and ASPERAcln, which refine the text embedding, enhanced the ability to accurately identify the timing of actions. Notably, ASPERAMC with the Markov head demonstrated the highest accuracy for invisible actions in the football video. ASPERAsrnd and ASPERAcln not only demonstrate the utility of text informatio n in football action spotting but also highlight key factors that enhance this effect, such as incorporating surrounding commentary text and removing background information. Finally, ASPERAMC shows the effectiveness of combining Transformer models and Markov chains for recognizing actions in invisible scenes. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.221.161.189

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Kumakura, T., Orihara, R., Tahara, Y., Ohsuga, A. and Sei, Y. (2025). ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART; ISBN 978-989-758-737-5; ISSN 2184-433X, SciTePress, pages 646-657. DOI: 10.5220/0013300700003890

@conference{icaart25,
author={Takane Kumakura and Ryohei Orihara and Yasuyuki Tahara and Akihiko Ohsuga and Yuichi Sei},
title={ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART},
year={2025},
pages={646-657},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013300700003890},
isbn={978-989-758-737-5},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART
TI - ASPERA: Exploring Multimodal Action Recognition in Football Through Video, Audio, and Commentary
SN - 978-989-758-737-5
IS - 2184-433X
AU - Kumakura, T.
AU - Orihara, R.
AU - Tahara, Y.
AU - Ohsuga, A.
AU - Sei, Y.
PY - 2025
SP - 646
EP - 657
DO - 10.5220/0013300700003890
PB - SciTePress