further improved for real-word use. With the detec-
tion rate of 85% and 9% of false positives, the system
is still not robust enough. The number of false posi-
tives should be close to zero because every response
activated by a false gesture detection would deterio-
rate the user experience. As shown, using two models
slightly alleviates this problem, but further improve-
ments are needed to enable smooth system usage.
There are several directions future work can take
to further improve our results. Firstly, our models are
trained independently. We believe they could benefit
from end-to-end training. The accuracy of the second
model is dependent on the output of the first model,
however, higher accuracy of the first model alone does
not necessarily lead to the better overall performance.
Secondly, the choice of hand-crafted features de-
rived from hand skeleton has a large influence on
model performance. We believe it should be further
explored to fully utilize model capacity.
Lastly, we selected the models’ parameters based
on single stratified split. Although time consuming, it
could prove beneficial to do grid search in combina-
tion with k-fold cross validation for the selection of
parameters.
ACKNOWLEDGEMENTS
This work was supported by the ”Development of an
advanced electric bicycles charging station for a smart
city” project co-funded under the Operational Pro-
gram from the European Structural and Investment
Funds.
REFERENCES
Caputo, A., Emporio, M., Giachetti, A., Cristani, M.,
Borghi, G., D’Eusanio, A., Le, M.-Q., Nguyen, H.-
D., Tran, M.-T., Ambellan, F., Hanik, M., Navayaz-
dani, E., and von Tycowicz, C. (2022). SHREC 2022
track on online detection of heterogeneous gestures.
Computers & Graphics, 107.
Caputo, A., Giachetti, A., Giannini, F., Lupinetti, K., Monti,
M., Pegoraro, M., and Ranieri, A. (2020). SFINGE
3D: A novel benchmark for online detection and
recognition of heterogeneous hand gestures from 3d
fingers’ trajectories. Comput. Graph., 91:232–242.
Caputo, A., Giachetti, A., Soso, S., Pintani, D., D’Eusanio,
A., Pini, S., Borghi, G., Simoni, A., Vezzani, R.,
Cucchiara, R., Ranieri, A., Giannini, F., Lupinetti,
K., Monti, M., Maghoumi, M., Jr, J., Le, M.-Q.,
Nguyen, H.-D., and Tran, M.-T. (2021). SHREC
2021: Skeleton-based hand gesture recognition in the
wild. Computers & Graphics, 99.
Caputo, F. M., Burato, S., Pavan, G., Voillemin, T., Wan-
nous, H., Vandeborre, J.-P., Maghoumi, M., Taranta,
E. M., A., Razmjoo, LaViola, J. J., Manganaro, F.,
Pini, S., Borghi, G., Vezzani, R., Cucchiara, R.,
Nguyen, H., Tran, M.-T., and Giachetti, A. (2019).
SHREC 2019 track: Online gesture recognition. In
Eurographics Workshop on 3D Object Retrieval.
Chen, X., Guo, H., Wang, G., and Zhang, L. (2017). Mo-
tion feature augmented recurrent neural network for
skeleton-based dynamic hand gesture recognition. In
IEEE International Conference on Image Processing
(ICIP), pages 2881–2885.
Devineau, G., Moutarde, F., Xi, W., and Yang, J. (2018).
Deep learning for hand gesture recognition on skele-
tal data. In 2018 13th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2018),
pages 106–113.
Emporio, M., Caputo, A., and Giachetti, A. (2021).
STRONGER: Simple TRajectory-based ONline GEs-
ture Recognizer. In Smart Tools and Apps for Graph-
ics - Eurographics Italian Chapter Conference. The
Eurographics Association.
Hou, J., Wang, G., Chen, X., Xue, J.-H., Zhu, R., and
Yang, H. (2019). Spatial-temporal attention res-tcn
for skeleton-based dynamic hand gesture recognition.
In Leal-Taix
´
e, L. and Roth, S., editors, Computer Vi-
sion – ECCV 2018 Workshops, pages 273–286, Cham.
Springer International Publishing.
Lupinetti, K., Ranieri, A., Giannini, F., and Monti, M.
(2020). 3d dynamic hand gestures recognition us-
ing the leap motion sensor and convolutional neural
networks. In De Paolis, L. T. and Bourdot, P., ed-
itors, Augmented Reality, Virtual Reality, and Com-
puter Graphics, pages 420–439, Cham. Springer In-
ternational Publishing.
Maghoumi, M. and LaViola, J. J. (2019). DeepGRU: Deep
gesture recognition utility. In ISVC.
N
´
u
˜
nez, J. C., Cabido, R., Pantrigo, J. J., Montemayor, A. S.,
and V
´
elez, J. F. (2018). Convolutional neural net-
works and long short-term memory for skeleton-based
human activity and hand gesture recognition. Pattern
Recognit., 76:80–94.
Shin, S. and Kim, W.-Y. (2020). Skeleton-based dy-
namic hand gesture recognition using a part-based
GRU-RNN for gesture-based interface. IEEE Access,
8:50236–50243.
Song, S., Lan, C., Xing, J., Zeng, W., and Liu, J. (2017). An
end-to-end spatio-temporal attention model for human
action recognition from skeleton data. In AAAI.
Wang, P., Li, Z., Hou, Y., and Li, W. (2016). Action recog-
nition based on joint trajectory maps using convolu-
tional neural networks. Proceedings of the 24th ACM
international conference on Multimedia.
Yang, F., Sakti, S., Wu, Y., and Nakamura, S. (2019). Make
skeleton-based action recognition model smaller,
faster and better. In ACM International Conference
on Multimedia in Asia.
Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A.,
Sung, G., Chang, C., and Grundmann, M. (2020).
Mediapipe hands: On-device real-time hand tracking.
CoRR, abs/2006.10214.
Two-Model-Based Online Hand Gesture Recognition from Skeleton Data
845