Online Detection of End of Take and Release Actions from Egocentric Videos
Alessandro Sebastiano Catinello, Giovanni Farinella, Antonino Furnari
2025
Abstract
In this work, we tackle the problem of detecting “take” and “release” actions from egocentric videos. We address the task following a new Online Detection of Action End (ODAE) formulation in which algorithms have to determine the end of an action in an online fashion. We show that ODAE has advantages over previous formulations that focus on detecting actions at the contact frame or offline, thanks to the reduced uncertainty due to the complete observation of events before a prediction is made. We adapt to this task and benchmark different state-of-the-art temporal online action detection models on the EPIC-KITCHENS dataset, highlighting the specific challenges of the ODAE task, such as sparse annotations and high action density. Analysis on THUMOS14 shows that most conclusions are valid also in a third-person vision scenario. We also investigate the impact of techniques such as label propagation to address annotation imbalance. Our results show that the problem is far from being solved, Mamba-based models consistently outperform transformer-based models in all settings.
DownloadPaper Citation
in Harvard Style
Catinello A., Farinella G. and Furnari A. (2025). Online Detection of End of Take and Release Actions from Egocentric Videos. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 863-870. DOI: 10.5220/0013249700003912
in Bibtex Style
@conference{visapp25,
author={Alessandro Catinello and Giovanni Farinella and Antonino Furnari},
title={Online Detection of End of Take and Release Actions from Egocentric Videos},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={863-870},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013249700003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - Online Detection of End of Take and Release Actions from Egocentric Videos
SN - 978-989-758-728-3
AU - Catinello A.
AU - Farinella G.
AU - Furnari A.
PY - 2025
SP - 863
EP - 870
DO - 10.5220/0013249700003912
PB - SciTePress