
eras. Skeleton data represents the human pose and
movement, focusing on body joints’ spatial config-
uration and temporal dynamics. In particular, this
work aims to investigate action segmentation perfor-
mance by estimating a projected skeleton and a “syn-
thetic” skeleton, which can be either a combination
of the skeleton information provided by both cameras
or an estimate when one of the cameras fails to pro-
vide skeleton data due to occlusion or being out of
range. When using a multi-camera system, one view
can provide more reliable data than others due to dif-
ferent factors, such as the camera’s particular orienta-
tion or the software’s ability to extract particular fea-
tures, such as skeleton information. As proved by the
experiments, the proposed approach addresses these
issues by estimating new skeletons, taking advantage
of the most reliable view.
ACKNOWLEDGMENT
The authors are deeply thankful to Michele Attolico
and Giuseppe Bono for their technical and adminis-
trative support. Research funded by PNRR - M4C2
- Investimento 1.3, Partenariato Esteso PE00000013
- “FAIR - Future Artificial Intelligence Research”
- Spoke 8 “Pervasive AI”, funded by the Euro-
pean Commission under the NextGeneration EU pro-
gramme.
REFERENCES
Beddiar, D. R., Nini, B., Sabokrou, M., and Hadid,
A. (2020). Vision-based human activity recogni-
tion: a survey. Multimedia Tools and Applications,
79:30509–30555.
Benmessabih, T., Slama, R., Havard, V., and Baudry, D.
(2024). Online human motion analysis in industrial
context: A review. Engineering Applications of Arti-
ficial Intelligence, 131:107850.
Brambilla, C., Marani, R., Romeo, L., Nicora, M. L., Storm,
F. A., Reni, G., Malosio, M., D’Orazio, T., and Scano,
A. (2023). Azure kinect performance evaluation for
human motion and upper limb biomechanical analy-
sis. Heliyon, 9(11).
Cicirelli, G., Attolico, C., Guaragnella, C., and D’Orazio,
T. (2015). A Kinect-based gesture recognition ap-
proach for a natural human robot interface. Interna-
tional Journal of Advanced Robotic Systems, 12(3).
Ding, G., Sener, F., and Yao, A. (2023). Temporal Ac-
tion Segmentation: An analysis of modern techniques.
IEEE Transactions on Pattern Analysis and Machine
Intelligence.
Filtjens, B., Vanrumste, B., and Slaets, P. (2022). Skeleton-
based action segmentation with multi-stage spatial-
temporal graph convolutional neural networks. IEEE
Transactions on Emerging Topics in Computing,
pages 1––11.
Grandini, M., Bagli, E., and Visani, G. (2020). Metrics
for Multi-Class Classification: an Overview. arXiv
preprint arXiv:2008.05756.
Jegham, I., Ben Khalifa, A., Alouani, I., and Mahjoub,
M. A. (2020). Vision-based human action recognition:
An overview and real world challenges. Forensic Sci-
ence International: Digital Investigation, 32:200901.
Khaire, P. and Kumar, P. (2022). Deep learning and
RGB-D based human action, human–human and hu-
man–object interaction recognition: A survey. Journal
of Visual Communication and Image Representation,
86:103531.
Li, S.-J., AbuFarha, Y., Liu, Y., Cheng, M.-M., and Gall,
J. (2023). MS-TCN++: Multi-Stage Temporal Con-
volutional Network for Action Segmentation. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, 45(6):6647–6658.
Ma, N., Wu, Z., Cheung, Y.-m., Guo, Y., Gao, Y., Li, J., and
Jiang, B. (2022). A Survey of Human Action Recog-
nition and Posture Prediction. Tsinghua Science and
Technology, 27(6):973–1001.
Manssor, S. A., Ren, Z., Huang, R., and Sun, S. (2021). Hu-
man Activity Recognition in Thermal Infrared Imag-
ing Based on Deep Recurrent Neural Networks. In
2021 14th International Congress on Image and Sig-
nal Processing, BioMedical Engineering and Infor-
matics (CISP-BMEI), pages 1–7.
Microsoft (2021). Azure Kinect DK documentation.
https://docs.microsoft.com/en-us/azure/kinect-dk/.
Moon, S., Park, Y., Ko, D., and Suh, I. (2018). Multiple
Kinect Sensor Fusion for Human Skeleton Tracking
Using Kalman Filtering. International Journal of Ad-
vanced Robotic Systems, 13(2).
Romeo, L., Marani, R., Perri, A., and D’Orazio, T.
(2022). Microsoft Azure Kinect Calibration for Three-
Dimensional Dense Point Clouds and Reliable Skele-
tons. Sensors, 22(13):4986.
Romeo, L., Maselli, M., Dom
´
ınguez, M. G., Marani,
R., Nicora, M. L., Cicirelli, G., Malosio, M., and
D’Orazio, T. (2024). A dataset on human-cobot
collaboration for action recognition in manufactur-
ing assembly. In 2024 10th International Conference
on Control, Decision and Information Technologies
(CoDIT), pages 866–871. IEEE.
Shaikh, M. B. and Chai, D. (2021). RGB-D Data-Based
Action Recognition: A Review. Sensors (Basel),
21(12):4246.
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G.,
and Liu, J. (2023). Human Action Recognition From
Various Data Modalities: A Review. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence,
45(3):3200–3225.
Yi, F., Wen, H., and Jiang, T. (2021). ASFormer: Trans-
former for Action Segmentation. In The British Ma-
chine Vision Conference (BMVC).
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
586