ACKNOW LED GM EN TS
We acknowledge financial suppor t from the p roject
PNRR MUR project PE0000013-FAIR.
REFERENCES
Abiodun, O. I., Jantan, A., Omolara, A., Dada, K. V., Mo-
hamed, N. A., and Arshad, H. (2018). State-of-the-
art in artificial neural network applications: A survey.
Heliyon, 4(11):e00938.
Bharati, P. and Pramanik, A. (2020). Deep learning
techniques—r-cnn to mask r-cnn: a survey. In Com-
putational Intelligence in Pattern Recognition: Pro-
ceedings of CIPR 2019, pages 657–668.
Caldarola, E. G., Picariello, A., and Rinaldi, A. M. (2015).
Big graph-based data visualization experiences: The
wordnet case study. In IC3K 2015 - Proceedings of the
7th International Joint Conference on Knowledge Dis-
covery, Knowledge Engineering and Knowledge Man-
agement, page 104 – 115.
Cook, D., Feuz, K. D., and Krishnan, N. C. (2013). Trans-
fer learning for activity recognition: A survey. Knowl-
edge and information systems, 36:537–556.
Dai, J., Li, Y., He, K., and Sun, J. (2016). R-fcn: Object de-
tection via region-based f ull y convolutional networks.
In Advances in neural information processing sys-
tems, volume 29.
Girshick, R. (2015). Fast r-cnn. I n Proceedings of the IEEE
international conference on computer vision, pages
1440–1448.
Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014).
Rich feature hierarchies for accurate object detec-
tion and semantic segmentation. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 580–587.
He, K., Gkioxari, G. , Doll´ar, P., and Girshick, R. (2017).
Mask r-cnn. In Proceedings of the IEEE international
conference on computer vision, pages 2961–2969.
Khan, R. Z . and Ibraheem, N. A. (2012). Hand gesture
recognition: a literature review. International journal
of artificial Intelligence & Applications, 3(4):161.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J. , P erona, P.,
Ramanan, D., ..., and Zitnick, C . L. (2014). Mi-
crosoft coco: Common objects in context. In Com-
puter Vision–ECCV 2014: 13th European Confer-
ence, Zurich, Switzerland, September 6-12, 2014, Pro-
ceedings, Part V 13, pages 740–755. Springer Interna-
tional Publishing.
Loshchilov, I. and Hutter, F. (2016). Sgdr: Stochastic
gradient descent with warm restarts. arXiv preprint
arXiv:1608.03983.
Madani, K., Rinaldi, A. M., Russo, C., and Tommasino,
C. (2023). A combined approach for improving
humanoid robots autonomous cognitive capabilities.
Knowledge and Information Systems, 65(8):3197–
3221.
Mitra, S. and Acharya, T. (2007). Gesture recognition:
A survey. IEEE Transactions on Systems, Man,
and Cybernetics, Part C (Applications and Reviews),
37(3):311–324.
Muscetti, M., Rinaldi, A. M., Russo, C., and Tommasino, C.
(2022). Multimedia ontology population through se-
mantic analysis and hierarchical deep features extrac-
tion techniques. Knowledge and Information Systems,
64(5):1283–1303.
Park, U. and Jain, A. K. (2007). 3d model-based face recog-
nition in video. In Advances in Biometrics: Interna-
tional Conference, ICB 2007, Seoul, Korea, August
27-29, 2007. Proceedings, pages 1085–1094. Springer
Berlin Heidelberg.
Rastgoo, R., Kiani, K., and Escalera, S. (2021). Sign lan-
guage recognition: A deep survey. Expert Systems
with Applications, 164:113794.
Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.
(2016). You only look once: Unified, real-time object
detection. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 779–
788.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with r egion
proposal networks. In Advances in neural information
processing systems, volume 28.
Rinaldi, A. M. and Russo, C. (2020). A content based im-
age retrieval approach based on multiple multimedia
features descriptors in e-health environment. In 2020
IEEE International Symposium on Medical Measure-
ments and Applications (MeMeA), pages 1–6. IEEE.
Rinaldi, A. M., Russo, C., and Tommasino, C. (2020). A
knowledge-driven multimedia retrieval system based
on semantics and deep features. Future Internet,
12(11):183.
Rinaldi, A. M., Russo, C., and Tommasino, C. (2021). Vi-
sual query posing in multimedia web document re-
trieval. In 2021 IEEE 15th International Confer-
ence on Semantic Computing (ICSC), pages 415–420.
IEEE.
Ruder, S. (2016). An overview of gradient de-
scent optimization algorithms. arXiv preprint
arXiv:1609.04747.
Russo, C., Madani, K., and Rinaldi, A. M. (2020). An unsu-
pervised approach for knowledge construction applied
to personal robots. IEEE Transactions on Cognitive
and Developmental Systems, 13(1):6–15.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and
Chen, L.-C. (2018). Mobilenetv2: Inverted residu-
als and linear bottlenecks. I n Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 4510–4520.
Suarez, J. and Murphy, R. R. (2012). Hand gesture recog-
nition with depth images: A review. In 2012 I EEE
RO-MAN: t he 21st IEEE international symposium on
robot and human interactive communication, pages
411–417. IEEE.
Tzutalin (2015). Labelimg. https://github.com/tzutalin/
labelImg.
Wadhawan, A. and Kumar, P. (2021). S ign language recog-
nition systems: A decade systematic literature review.
Archives of Computational Methods in Engineering,
28:785–813.
Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. (2022).
Yolov7: Trainable bag-of-freebies sets new state-of-
the-art for real-time object detectors. arXiv preprint
harXiv:2207.02696.