ACKNOWLEDGEMENTS
This work was carried out in the context of the multi-
sensor data fusion group at the research institute for-
tiss.
REFERENCES
Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H.
(2007). Greedy layer-wise training of deep networks.
pages 153–160.
Cheng, Z., Qin, L., Ye, Y., Huang, Q., and Tian, Q.
(2012). Human daily action analysis with multi-view
and color-depth data. In Proceedings of the 12th inter-
national conference on Computer Vision - Volume 2,
ECCV’12, pages 52–61, Berlin, Heidelberg. Springer-
Verlag.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gra-
dients for human detection. In In CVPR, pages 886–
893.
Han, L., Wu, X., Liang, W., Hou, G., and Jia, Y.
(2010). Discriminative human action recognition in
the learned hierarchical manifold space. Image Vision
Comput., 28(5):836–849.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast
learning algorithm for deep belief nets. Neural Com-
putation, 18(7):1527–1554.
Hyvrinen, A., Hurri, J., and Hoyer, P. O. (2009). Natural
Image Statistics: A Probabilistic Approach to Early
Computational Vision. Springer Publishing Company,
Incorporated, 1st edition.
Laptev, I. (2005). On space-time interest points. Int. J.
Comput. Vision, 64(2-3):107–123.
Laptev, I., Marszałek, M., Schmid, C., and Rozenfeld,
B. (2008). Learning realistic human actions from
movies. In Conference on Computer Vision & Pattern
Recognition.
Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond
bags of features: Spatial pyramid matching for recog-
nizing natural scene categories. In Computer Vision
and Pattern Recognition, 2006 IEEE Computer Soci-
ety Conference on, volume 2, pages 2169–2178.
Le, Q., Zou, W., Yeung, S., and Ng, A. (2011). Learning
hierarchical invariant spatio-temporal features for ac-
tion recognition with independent subspace analysis.
In Computer Vision and Pattern Recognition (CVPR),
2011 IEEE Conference on, pages 3361–3368.
Li, W., Zhang, Z., and Liu, Z. (2008). Expandable data-
driven graphical modeling of human actions based on
salient postures. IEEE Trans. Cir. and Sys. for Video
Technol., 18(11):1499–1510.
Li, W., Zhang, Z., and Liu, Z. (2010). Action recognition
based on a bag of 3d points.
Lv, F. and Nevatia, R. (2006). Recognition and segmenta-
tion of 3-d human action using hmm and multi-class
adaboost. In Leonardis, A., Bischof, H., and Pinz, A.,
editors, Computer Vision ECCV 2006, volume 3954
of Lecture Notes in Computer Science, pages 359–
372. Springer Berlin Heidelberg.
M
¨
uller, M. and R
¨
oder, T. (2006). Motion templates for
automatic classification and retrieval of motion cap-
ture data. In Proceedings of the 2006 ACM SIG-
GRAPH/Eurographics symposium on Computer an-
imation, SCA ’06, pages 137–146, Aire-la-Ville,
Switzerland, Switzerland. Eurographics Association.
Oreifej, O. and Liu, Z. (2013). Hon4d: Histogram of ori-
ented 4d normals for activity recognition from depth
sequences. In Computer Vision and Pattern Recogni-
tion (CVPR).
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio,
M., Moore, R., Kipman, A., and Blake, A. (2011).
Real-time human pose recognition in parts from sin-
gle depth images. In Proceedings of the 2011 IEEE
Conference on Computer Vision and Pattern Recogni-
tion, CVPR ’11, pages 1297–1304, Washington, DC,
USA. IEEE Computer Society.
Vishwanathan, S. V. N., Sun, Z., Theera-Ampornpunt, N.,
and Varma, M. (2010). Multiple kernel learning and
the SMO algorithm. In Advances in Neural Informa-
tion Processing Systems.
Wang, J., Liu, Z., Chorowski, J., Chen, Z., and Wu, Y.
(2012a). Robust 3d action recognition with random
occupancy patterns. In Proceedings of the 12th Euro-
pean conference on Computer Vision - Volume Part
II, ECCV’12, pages 872–885, Berlin, Heidelberg.
Springer-Verlag.
Wang, J., Liu, Z., Wu, Y., and Yuan, J. (2012b). Mining
actionlet ensemble for action recognition with depth
cameras. In Computer Vision and Pattern Recogni-
tion (CVPR), 2012 IEEE Conference on, pages 1290–
1297.
Xia, L. and Aggarwal, J. (2013). Spatio-temporal depth
cuboid similarity feature for activity recognition us-
ing depth camera. In Computer Vision and Pattern
Recognition (CVPR).
Xia, L., Chen, C.-C., and Aggarwal, J. (2012). View invari-
ant human action recognition using histograms of 3d
joints. In Computer Vision and Pattern Recognition
Workshops (CVPRW), 2012 IEEE Computer Society
Conference on, pages 20–27.
Yang, X. and Tian, Y. (2012). Eigenjoints-based action
recognition using nave-bayes-nearest-neighbor. In
CVPR Workshops, pages 14–19. IEEE.
Zhang, H. and Parker, L. (2011). 4-dimensional local
spatio-temporal features for human activity recogni-
tion. In Intelligent Robots and Systems (IROS), 2011
IEEE/RSJ International Conference on, pages 2044–
2049.
Zhao, Y., Liu, Z., Yang, L., and Cheng, H. (2012). Combing
rgb and depth map features for human activity recog-
nition. In Signal Information Processing Association
Annual Summit and Conference (APSIPA ASC), 2012
Asia-Pacific, pages 1–4.
VISAPP2014-InternationalConferenceonComputerVisionTheoryandApplications
556