Human Action Description Based on Temporal Pyramid Histograms

Yingying Liu, Arcot Sowmya

Abstract

In this paper, we present an approach to action description based on temporal pyramid histograms. Bag of features is a widely used action recognition framework based on local features, for example spatio-temporal feature points. Although it outperforms other approaches on several public datasets, sequencing information is ignored. Instead of only calculating the occurrence of code words, we also encode their temporal layout in this work. The proposed temporal pyramid histograms descriptor is a set of histogram atoms generated from the original video clip and its subsequences. To classify actions based on the temporal pyramid histograms descriptor, we design a function to calculate the weights of the histogram atoms according to the corresponding sequence lengths. We test the descriptor using nearest neighbour for classification. Experimental results show that, in comparison to the state-of-the-art, our description approach improves action recognition accuracy.

References

  1. Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005). Actions as space-time shapes. In ICCV 2005, volume 2, pages 1395-1402 Vol. 2.
  2. Bobick, A. and Davis, J. (2001). The recognition of human movement using temporal templates. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 23(3):257-267.
  3. Bosch, A., Zisserman, A., and Munoz, X. (2007). Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, pages 401-408. ACM.
  4. Choi, J., Jeon, W. J., and Lee, S.-C. (2008). Spatio-temporal pyramid matching for sports videos. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval, MIR 7808, pages 291-297. ACM.
  5. Dollár, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005. 2nd Joint IEEE International Workshop on, pages 65-72.
  6. Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recognizing action at a distance. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 726-733 vol.2.
  7. Gilbert, A., Illingworth, J., and Bowden, R. (2009). Fast realistic multi-action recognition using mined dense spatio-temporal features. In Computer Vision, 2009 IEEE 12th International Conference on, pages 925- 931.
  8. Laptev, I. and Lindeberg, T. (2003). Space-time interest points. In Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 432- 439 vol.1.
  9. Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR2006, volume 2, pages 2169-2178.
  10. Marszalek, M., Laptev, I., and Schmid, C. (2009). Actions in context. In CVPR 2009, pages 2929-2936.
  11. Niebles, J. C., Wang, H., and Fei-fei, L. (2006). Unsupervised learning of human action categories using spatial-temporal words. In In Proc. BMVC.
  12. Oikonomopoulos, A., Patras, I., and Pantic, M. (2005). Spatiotemporal salient points for visual recognition of human actions. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36(3):710-719.
  13. Poppe, R. (2010). A survey on vision-based human action recognition. Image Vision Comput., 28(6):976-990.
  14. Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: a local svm approach. In ICPR 2004, volume 3, pages 32-36 Vol.3.
  15. Scovanner, P., Ali, S., and Shah, M. (2007). A 3- dimensional sift descriptor and its application to action recognition. In Proceedings of the 15th International Conference on Multimedia, pages 357-360. ACM.
  16. Shen, Y. and Foroosh, H. (2008). View-invariant recognition of body pose from space-time templates. In CVPR 2008, pages 1-6.
  17. Sun, J., Wu, X., Yan, S., Cheong, L.-F., Chua, T.-S., and Li, J. (2009). Hierarchical spatio-temporal context modeling for action recognition. In CVPR 2009, pages 2004-2011.
  18. Wang, H., Ullah, M. M., Klser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In University of Central Florida, U.S.A.
  19. Wang, Y., Huang, K., and Tan, T. (2007). Human activity recognition based on r transform. In CVPR2007, pages 1-8.
  20. Willems, G., Tuytelaars, T., and Gool, L. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In Proceedings of the 10th European Conference on Computer Vision: Part II, pages 650-663. Springer-Verlag.
Download


Paper Citation


in Harvard Style

Liu Y. and Sowmya A. (2014). Human Action Description Based on Temporal Pyramid Histograms . In Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM, ISBN 978-989-758-018-5, pages 629-636. DOI: 10.5220/0004825206290636


in Bibtex Style

@conference{icpram14,
author={Yingying Liu and Arcot Sowmya},
title={Human Action Description Based on Temporal Pyramid Histograms},
booktitle={Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,},
year={2014},
pages={629-636},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004825206290636},
isbn={978-989-758-018-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods - Volume 1: ICPRAM,
TI - Human Action Description Based on Temporal Pyramid Histograms
SN - 978-989-758-018-5
AU - Liu Y.
AU - Sowmya A.
PY - 2014
SP - 629
EP - 636
DO - 10.5220/0004825206290636