HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion
Muhammad Asad Ali, Muhammad Asad Ali, Nadia Robertini, Didier Stricker, Didier Stricker
2025
Abstract
In this work, we present HandMvNet, one of the first real-time method designed to estimate 3D hand motion and shape from multi-view camera images. Unlike previous monocular approaches, which suffer from scale-depth ambiguities, our method ensures consistent and accurate absolute hand poses and shapes. This is achieved through a multi-view attention-fusion mechanism that effectively integrates features from multiple viewpoints. In contrast to previous multi-view methods, our approach eliminates the need for camera parameters as input to learn 3D geometry. HandMvNet also achieves a substantial reduction in inference time while delivering competitive results compared to the state-of-the-art methods, making it suitable for real-time applications. Evaluated on publicly available datasets, HandMvNet qualitatively and quantitatively outperforms previous methods under identical settings. Code is available at github.com/pyxploiter/handmvnet.
DownloadPaper Citation
in Harvard Style
Ali M., Robertini N. and Stricker D. (2025). HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 555-562. DOI: 10.5220/0013107300003912
in Bibtex Style
@conference{visapp25,
author={Muhammad Ali and Nadia Robertini and Didier Stricker},
title={HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP},
year={2025},
pages={555-562},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013107300003912},
isbn={978-989-758-728-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 2: VISAPP
TI - HandMvNet: Real-Time 3D Hand Pose Estimation Using Multi-View Cross-Attention Fusion
SN - 978-989-758-728-3
AU - Ali M.
AU - Robertini N.
AU - Stricker D.
PY - 2025
SP - 555
EP - 562
DO - 10.5220/0013107300003912
PB - SciTePress