Ch´eron, G., Laptev, I., and Schmid, C. (2015). P- cnn: Pose-
based cnn features for action recognition. In Procee-
dings of the IEEE international conference on compu-
ter vision, pages 3218–3226.
Cippitelli, E., Gasparrini, S., Spinsante, S., and Gambi, E.
(2015). Kinect as a tool for gait analysis: validation of
a real-time joint extraction algorithm working in side
view. Sensors, 15(1):1417–1434.
Comaniciu, D. and Meer, P. (2002). Mean shift: A robust
approach toward feature space analysis. IEEE Tran-
sactions on pattern analysis and machine intelligence,
24(5):603–619.
Consortium, O. et al. Openni, the standard framework for
3d sensing. URL as accessed on. 2017-09-30.
Essmaeel, K., Migniot, C., and Dipanda, A . (2016). 3d des-
criptor for an oriented-human classification from com-
plete point cloud. In VISIGRAPP (4: VISAPP), pages
353–360.
Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S.
(2010). Real time motion capture using a single time-
of-flight camera. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 755–
762.
Ganapathi, V., Plagemann, C., Koller, D., and Thrun, S.
(2012). Real-time human pose tracking from range
data. In European conference on computer vision, pa-
ges 738–751. Springer.
Han, Y., Zhang, P., Zhuo, T., Huang, W., and Zhang, Y.
(2017). Video action recognition based on deeper con-
volution networks with pair-wise frame motion conca-
tenation. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition Workshops,
pages 8–17.
Hinterstoisser, S. , Lepetit, V., Ilic, S., Holzer, S., Bradski,
G. R., Konolige, K., and Navab, N. (2012). Model ba-
sed training, detection and pose estimation of texture-
less 3d objects in heavily cluttered scenes. In ACCV
(1), pages 548–562.
Jhuang, H., Gall, J., Zuffi, S., Schmid, C., and B lack, M. J.
(2013). Towards understanding action recognition. In
Proceedings of the IEEE international conference on
computer vision, pages 3192–3199.
Lan, Z., Zhu, Y., Hauptmann, A. G., and Newsam, S.
(2017). Deep local video feature for action recog-
nition. In IEEE Conference on Computer Vision
and Pattern Recognition Workshops (CVPRW), pages
1219–1225.
Li, S., Liu, Z.- Q ., and Chan, A. B. (2014). Heterogeneous
multi-task learning for human pose estimation with
deep convolutional neural network. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pages 482–489.
Li, W., Zhang, Z., and Liu, Z. (2010). A ction recognition
based on a bag of 3d points. In IEEE Computer So-
ciety Conference on Computer Vision and Pattern Re-
cognition Workshops (CVPRW), pages 9–14.
Mentiplay, B. F., Perraton, L. G., Bower, K. J., Pua, Y.-H.,
McGaw, R., Heywood, S., and Clark, R. A. (2015).
Gait assessment using the microsoft xbox one kinect:
Concurrent validity and inter-day reliability of spatio-
temporal and kinematic variables. Journal of biome-
chanics, 48(10):2166–2170.
Peng, B. and Luo, Z. (2016). Multi-view 3d pose estimation
from single depth images. Technical report, Techni-
cal report, St anford University, USA, Report, Course
CS231n: Convolutional Neural Networks for Visual
Recognition.
Pishchulin, L., Andriluka, M., Gehler, P., and Schiele, B.
(2013). Poselet conditioned pictorial structures. In
Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, pages 588–595.
Sarafianos, N., Boteanu, B., Ionescu, B., and Kakadiaris,
I. A. (2016). 3d human pose estimation: A review
of the literature and analysis of covariates. Computer
Vision and Image Understanding, 152:1–20.
Shafaei, A. and Little, J. J. (2016). Real-time human motion
capture with multiple depth cameras. In IEEE 13th
Conference on Computer and Robot Vision (CRV), pa-
ges 24–31.
Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Fi -
nocchio, M., Blake, A., Cook, M., and Moore, R.
(2013). Real-time human pose recognition in parts
from single depth i mages. Communications of the
ACM, 56(1):116–124.
Tang, D., Jin Chang, H., Tejani, A., and Kim, T.-K. (2014).
Latent regression forest: Structured estimation of 3d
articulated hand posture. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 3786–3793.
Tompson, J. J., Jain, A., LeCun, Y., and Br egler, C. (2014).
Joint trai ning of a convolutional network and a graphi-
cal model for human pose estimation. In Advances in
neural information processing systems, pages 1799–
1807.
Vieira, A., Nascimento, E., Oliveira, G., Liu, Z., and Cam-
pos, M. (2012). Stop: Space-time occupancy pat-
terns for 3d action recognition fr om depth map se-
quences. Progress in Pattern Recognition, Image Ana-
lysis, Computer Vision, and Applications, pages 252–
259.
Wang, W.-J. , Chang, J.-W., Haung, S. -F., and Wang, R.-J.
(2016). Human posture recognition based on images
captured by the kinect sensor. International Journal
of Advanced Robotic Systems, 13(2):54.
Wohlhart, P. and Lepetit, V. (2015). Learning descriptors
for object recognition and 3d pose estimation. In Pro-
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pages 3109–3118.
Yang, Y. and Ramanan, D. (2011). Articulated pose esti-
mation with flexible mixtures-of-parts. In IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 1385–1392.
Ye, M. and Yang, R. (2014). Real-time simultaneous pose
and shape estimation for articulated objects using a
single depth camera. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 2345–2352.
Yub Jung, H., Lee, S., Seok Heo, Y., and Dong Yun, I.
(2015). Random tree walk toward instantaneous 3d
human pose estimation. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recogni-
tion, pages 2467–2474.