
most recent models, stands out with high mIoU
scores of 71.98 and 70.91 on the IPN and RHD
datasets, respectively. Yet, FingerSeg overcomes
PIDNet in accuracy while requiring fewer parameters.
The GFLOPs of FingerSeg and PIDNet are closely
matched, underscoring FingerSeg’s architectural opti-
mizations that allow for high accuracy without a sub-
stantial increase in computational demand.
In summary, FingerSeg sets new standards in seg-
mentation accuracy and exhibits a notorious balance
between computational requirements and model com-
plexity. This performance is particularly important
given the fine-grained nature of the finger segmenta-
tion task, proving the worth of FingerSeg’s design.
6 CONCLUSIONS AND FUTURE
WORK
In this paper, we introduced FingerSeg as an advanced
solution for finger-level hand segmentation. Through
meticulous design and the integration of specialized
modules (EBF, MAM, and ADU), FingerSeg has
demonstrated a significant leap forward in the accu-
racy and efficiency of semantic segmentation for nu-
anced hand gestures. The empirical results, bolstered
by thorough ablation studies and comparisons with
state-of-the-art methods, affirm FingerSeg’s standing
as a leading solution to the presented task. More-
over, the creation and annotation of the IPN-Finger
dataset have not only facilitated the development of
FingerSeg but also enriched the resources available to
the research community. By offering this dataset pub-
licly, alongside the FingerSeg model, we anticipate
stimulating further innovation and exploration in the
detailed segmentation of hands and fingers.
Looking ahead, the integration of FingerSeg into
multimodal hand gesture recognition (HGR) systems
presents promising future work. Its application as an
additional modality can potentially enrich the inter-
pretative capabilities of HGR, particularly in complex
or nuanced scenarios. Exploring the synergy between
FingerSeg’s detailed segmentation and other modali-
ties will be instrumental in developing more intuitive
and natural user interfaces, contributing significantly
to advancements in human-computer interaction.
ACKNOWLEDGEMENTS
This work is supported by a Research Grant (S) at
Tateisi Science and Technology Foundation.
REFERENCES
Baek, S., Kim, K. I., and Kim, T.-K. (2019). Pushing the
envelope for rgb-based dense 3d hand pose estimation
via neural rendering. In The IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 1067–1076.
Bambach, S., Lee, S., Crandall, D. J., and Yu, C. (2015).
Lending a hand: Detecting hands and recognizing ac-
tivities in complex egocentric interactions. In The
IEEE International Conference on Computer Vision
(ICCV), pages 1949–1957.
Bandini, A. and Zariffa, J. (2020). Analysis of the hands
in egocentric vision: A survey. IEEE transactions on
pattern analysis and machine intelligence.
Benitez-Garcia, G., Olivares-Mercado, J., Sanchez-Perez,
G., and Yanai, K. (2021a). Ipn hand: A video dataset
and benchmark for real-time continuous hand gesture
recognition. In 2020 25th international conference on
pattern recognition (ICPR), pages 4340–4347. IEEE.
Benitez-Garcia, G., Prudente-Tixteco, L., Castro-Madrid,
L. C., Toscano-Medina, R., Olivares-Mercado, J.,
Sanchez-Perez, G., and Villalba, L. J. G. (2021b). Im-
proving real-time hand gesture recognition with se-
mantic segmentation. Sensors, 21(2):356.
Cai, M., Lu, F., and Sato, Y. (2020). Generalizing hand
segmentation in egocentric videos with uncertainty-
guided model adaptation. In The IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 14392–14401.
Chao, P., Kao, C.-Y., Ruan, Y.-S., Huang, C.-H., and Lin,
Y.-L. (2019). HarDNet: A Low Memory Traffic Net-
work. In The IEEE International Conference on Com-
puter Vision (ICCV).
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and
Adam, H. (2018). Encoder-decoder with atrous sep-
arable convolution for semantic image segmentation.
In The European Conference on Computer Vision
(ECCV), pages 801–818.
Dadashzadeh, A., Targhi, A. T., Tahmasbi, M., and Mirme-
hdi, M. (2019). Hgr-net: a fusion network for hand
gesture segmentation and recognition. IET Computer
Vision, 13(8):700–707.
He, A., Li, T., Li, N., Wang, K., and Fu, H. (2020). Cab-
net: Category attention block for imbalanced diabetic
retinopathy grading. IEEE Transactions on Medical
Imaging, 40(1):143–153.
Kim, S., Chi, H.-g., Hu, X., Vegesana, A., and Ramani, K.
(2020). First-person view hand segmentation of multi-
modal hand activity video dataset. In British Machine
Vision Conference (BMVC).
Li, G. and Kim, J. (2019). DABNet: Depth-wise Asymmet-
ric Bottleneck for Real-time Semantic Segmentation.
In British Machine Vision Conference (BMVC).
Li, M., Sun, L., and Huo, Q. (2019). Flow-guided feature
propagation with occlusion aware detail enhancement
for hand segmentation in egocentric videos. Computer
Vision and Image Understanding, 187:102785.
Likitlersuang, J., Sumitro, E. R., Cao, T., Vis
´
ee, R. J., Kalsi-
Ryan, S., and Zariffa, J. (2019). Egocentric video: a
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
250