Multi-Modal Multi-View Perception Feature Tracking for Handover Human Robot Interaction Applications

Chaitanya Bandi, Ulrike Thomas

2025

Abstract

Object handover is a fundamental task in human-robot interaction (HRI) that relies on robust perception features such as hand pose estimation, object pose estimation, and human pose estimation. While human pose estimation has been extensively researched, this work focuses on creating a comprehensive architecture to track and analyze hand and object poses, thereby enabling effective handover state determination. We propose an end-to-end architecture that integrates unified hand-object pose estimation with hand pose tracking, leveraging an early and efficient fusion of RGB and depth modalities. Our method incorporates existing state-of-the-art techniques for human pose estimation and introduces novel advancements for hand-object pose estimation. The architecture is evaluated on three large-scale open-source datasets, demonstrating state-of-the-art performance in unified hand-object pose estimation. Finally, we implement our approach in a human-robot interaction scenario to determine the handover state by extracting and tracking the necessary perception features. This integration highlights the potential of the proposed system for enhancing collaboration in HRI applications.

Download


Paper Citation


in Harvard Style

Bandi C. and Thomas U. (2025). Multi-Modal Multi-View Perception Feature Tracking for Handover Human Robot Interaction Applications. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 797-807. DOI: 10.5220/0013373800003912


in Bibtex Style

@conference{visapp25,
author={Chaitanya Bandi and Ulrike Thomas},
title={Multi-Modal Multi-View Perception Feature Tracking for Handover Human Robot Interaction Applications},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2025},
pages={797-807},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013373800003912},
isbn={978-989-758-728-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - Multi-Modal Multi-View Perception Feature Tracking for Handover Human Robot Interaction Applications
SN - 978-989-758-728-3
AU - Bandi C.
AU - Thomas U.
PY - 2025
SP - 797
EP - 807
DO - 10.5220/0013373800003912
PB - SciTePress