Authors:
Ahmed Snoun
;
Tahani Bouchrika
and
Olfa Jemai
Affiliation:
Research Team in Intelligent Machines (RTIM), National Engineering School of Gabes (ENIG), University of Gabes, Gabes, Tunisia
Keyword(s):
Human Activity Recognition, 3D Skeleton, Spatio-temporal Features, View-invariant, Transformer.
Abstract:
With the emergence of depth sensors, real-time 3D human skeleton estimation have become easier to accomplish. Thus, methods for human activity recognition (HAR) based on 3D skeleton have become increasingly accessible. In this paper, we introduce a new approach for human activity recognition using 3D skeletal data. Our approach generates a set of spatio-temporal and view-invariant features from the skeleton joints. Then, the extracted features are analyzed using a typical Transformer encoder in order to recognize the activity. In fact, Transformers, which are based on self-attention mechanism, have been successful in many domains in the last few years, which makes them suitable for HAR. The proposed approach shows promising performance on different well-known datasets that provide 3D skeleton data, namely, KARD, Florence 3D, UTKinect Action 3D and MSR Action 3D.