FingerSeg: Highly-Efficient Dual-Resolution Architecture for Precise Finger-Level Semantic Segmentation

Gibran Benitez-Garcia; Hiroki Takahashi; Hiroki Takahashi; Hiroki Takahashi

doi:10.5220/0012575000003660

FingerSeg: Highly-Efficient Dual-Resolution Architecture for Precise Finger-Level Semantic Segmentation

Gibran Benitez-Garcia, Hiroki Takahashi, Hiroki Takahashi, Hiroki Takahashi

2024

Abstract

Semantic segmentation at the finger level poses unique challenges, including the limited pixel representation of some classes and the complex interdependency of the hand anatomy. In this paper, we propose FingerSeg, a novel architecture inspired by Deep Dual-Resolution Networks, specifically adapted to address the nuances of finger-level hand semantic segmentation. To this end, we introduce three modules: Enhanced Bilateral Fusion (EBF), which refines low- and high-resolution feature fusion via attention mechanisms; Multi-Attention Module (MAM), designed to augment high-level features with a composite of channel, spatial, orientational, and categorical attention; and Asymmetric Dilated Up-sampling (ADU), which combines standard and asymmetric atrous convolutions to capture rich contextual information for pixel-level classification. To properly evaluate our proposal, we introduce IPN-finger, a subset of the IPN-Hand dataset, manually annotated pixel-wise for 13 finger-related classes. Our extensive empirical analysis, including evaluations of the synthetic RHD dataset against current state-of-the-art methods, demonstrates that our proposal achieves top results. FingerSeg reaches 73.8 and 71.1 mIoU on the IPN-Finger and RHD datasets, respectively, while maintaining an efficient computational cost of about 7 GFLOPs and 6 million parameters at VGA resolution. The dataset, source code, and a demo of FingerSeg will be available upon the publication of this paper.

Download

Paper Citation

in Harvard Style

Benitez-Garcia G. and Takahashi H. (2024). FingerSeg: Highly-Efficient Dual-Resolution Architecture for Precise Finger-Level Semantic Segmentation. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-679-8, SciTePress, pages 242-251. DOI: 10.5220/0012575000003660

in Bibtex Style

@conference{visapp24,
author={Gibran Benitez-Garcia and Hiroki Takahashi},
title={FingerSeg: Highly-Efficient Dual-Resolution Architecture for Precise Finger-Level Semantic Segmentation},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2024},
pages={242-251},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012575000003660},
isbn={978-989-758-679-8},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - FingerSeg: Highly-Efficient Dual-Resolution Architecture for Precise Finger-Level Semantic Segmentation
SN - 978-989-758-679-8
AU - Benitez-Garcia G.
AU - Takahashi H.
PY - 2024
SP - 242
EP - 251
DO - 10.5220/0012575000003660
PB - SciTePress