Authors:
Amani Elaoud
1
;
Walid Barhoumi
2
;
Hassen Drira
3
and
Ezzeddine Zagrouba
1
Affiliations:
1
Université de Tunis El Manar, Institut Supérieur d’Informatique, Research Team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), LR16ES06 Laboratoire de Recherche en Informatique, Modélisation et Traitement de l’Information et de la Connaissance (LIMTIC), 2 Rue Bayrouni, 2080 Ariana and Tunisia
;
2
Université de Tunis El Manar, Institut Supérieur d’Informatique, Research Team on Intelligent Systems in Imaging and Artificial Vision (SIIVA), LR16ES06 Laboratoire de Recherche en Informatique, Modélisation et Traitement de l’Information et de la Connaissance (LIMTIC), 2 Rue Bayrouni, 2080 Ariana, Tunisia, Université de Carthage, Ecole Nationale d’Ingénieurs de Carthage (ENICarthage), 45 Rue des Entrepreneurs, 2035 Tunis-Carthage and Tunisia
;
3
IMT Lille Douai, Univ. Lille, CNRS, UMR 9189 – CRIStAL – Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille and France
Keyword(s):
3D Human Action, Temporal Modeling, Grassmann Manifold, Special Orthogonal Group, Weighted Distance, Human Skeleton.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Computer Vision, Visualization and Computer Graphics
;
Geometry and Modeling
;
Image-Based Modeling
;
Motion, Tracking and Stereo Vision
;
Optical Flow and Motion Analyses
;
Pattern Recognition
;
Software Engineering
;
Tracking and Visual Navigation
Abstract:
Human action recognition based on RGB-D sequences is an important research direction in the field of computer vision. In this work, we incorporate the skeleton on the Grassmann manifold in order to model the human action as a trajectory. Given the couple of matched points on the Grassmann manifold, we introduce the special orthogonal group SO(3) to exploit the rotation ignored by the Grassmann manifold. In fact, our objective is to define the best weighted linear combination between distances in Grassmann and SO(3) manifolds according to the nature of action, while modeling human actions by temporal trajectories and finding the best weighted combination. The effectiveness of combining the two non-Euclidean spaces was validated on three standard challenging 3D human action recognition datasets (G3D-Gaming, UTD-MHAD multimodal action and Florence3D-Action), and the preliminary results confirm the accuracy of the proposed method comparatively to relevant methods from the state of the art.