phenotypes. In each case and for each frame, we ap-
ply our shape matching method to each frame. As
the resulting silhouette from the database belongs to a
specific movement class we simply count the number
of occurencies. The more represented class is then
considered as the detected movement.
Based on this very simple workflow, we got 71.66%
of good action classification. The confusion matrix
is shown on the fig.13. Of course, this accuracy
rate is lower than the recent accuracy obtained on the
same database (Blank 99.64% (Blank et al., 2005) and
Gorelick 97.83% (Gorelick et al., 2007)). But both of
these approach used space-times cubes to analyse the
motion while we do not consider yet the temporal cor-
relation between successives frame.
According to Gorelick et al.: many successive frames
from the first action (about run) may exhibit high spa-
tial similarity to the successive frames from the sec-
ond one. Ignoring the dynamics within the frames
might lead to confusion between the two actions. As
the approach does not take into account time dimen-
sion, frame to frame comparison leads to misclassifi-
cation for these very similar frame to frame actions:
run, skip and jump.
Figure 13: Confusion matrix.
In future work, we will use our proposed approach
combined with the multi-hypothesis tracking tech-
niques (with N neighboors) to improve the accuracy
of action classification. By this way, we will take into
account the temporal information and the dynamic of
the action.
6 CONCLUSIONS
In this paper, we presented a new approach for 3D
human pose estimation and action classification in
video. The learning database is easily generated
thanks to open source softwares which allow any hu-
man pose simulaion. The proposed posture recogni-
tion method is based on the geometric Krawtchouck
moment and gives promising results. Both applica-
tion to 3D pose estimation and action classification
have been presented. In our work, we tested different
moment order and selected the best suitable for our
approach. We compared our approach with some re-
lated work in action classification and we concluded
that this approach can be improved by using multi-
hypothesis tracking during action identification and
classification. In future work, we will use a combina-
tion of local and global shape descriptor for improv-
ing the pose estimation, and use the estimated poses
to construct an action model for activity classification.
REFERENCES
Agarwal, A. and Triggs, B. (2006). Recovering 3d hu-
man pose from monocular images. Pattern Analy-
sis and Machine Intelligence, IEEE Transactions on,
28(1):44–58.
Aggarwal, J. and Cai, Q. (1999). Human motion analysis:
A review. Computer Vision and Image Understanding,
73(3):428–440.
Andriluka, M., Roth, S., and Schiele, B. (2010). Monocular
3d pose estimation and tracking by detection. In Com-
puter Vision and Pattern Recognition (CVPR), 2010
IEEE Conference on, pages 623–630. IEEE.
Baumberg, A. and Hogg, D. (1994). Learning flexible mod-
els from image sequences. Springer.
Blank, M., Gorelick, L., Shechtman, E., Irani, M., and
Basri, R. (2005). Actions as space-time shapes. In
The Tenth IEEE International Conference on Com-
puter Vision (ICCV’05), pages 1395–1402.
Bourdev, L. and Malik, J. (2009). Poselets: Body part de-
tectors trained using 3d human pose annotations. In
Computer Vision, 2009 IEEE 12th International Con-
ference on, pages 1365–1372. IEEE.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In In: IEEE Conference
on Computer Vision and Pattern Recognition, pages
886–893.
de La Gorce, M., Fleet, D., and Paragios, N. (2011).
Model-based 3d hand pose estimation from monocu-
lar video. Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 33(9):1793–1805.
Gavrila, D. M. and Davis, L. S. (1996). 3-d model-based
tracking of humans in action: a multi-view approach.
In Computer Vision and Pattern Recognition, 1996.
Proceedings CVPR’96, 1996 IEEE Computer Society
Conference on, pages 73–80. IEEE.
Gorelick, L., Blank, M., Shechtman, E., Irani, M., and
Basri, R. (2005). Actions as space-time shapes. In
In ICCV, pages 1395–1402.
Gorelick, L., Blank, M., Shechtman, E., Irani, M., and
Basri, R. (2007). Actions as space-time shapes. Trans-
actions on Pattern Analysis and Machine Intelligence,
29(12):2247–2253.
Guo, K., Ishwar, P., and Konrad, J. (2009). Action recogni-
tion in video by covariance matching of silhouette tun-
VISAPP 2016 - International Conference on Computer Vision Theory and Applications
368