HUMAN ACTION RECOGNITION USING CONTINUOUS HMMS AND HOG/HOF SILHOUETTE REPRESENTATION

Mohamed Ibn Khedher, Mounim A. El-Yacoubi, Bernadette Dorizzi

Abstract

This paper presents an alternative to the mainstream approach of STIP-based SVM recognition for human recognition. First, it studies whether or not whole silhouette representation by Histogram-of-Oriented-Gradients (HOG) or Histogram-of-Optical-Flow (HOF) descriptors is more discriminated when compared to sparse spatio-temporal interest points (STIPs). Second, it investigates whether explicitly modeling the temporal order of features using continuous HMMs outperforms the standard Bag-of-Words (BoW) representation that overlooks such an order. When both whole silhouette representation and temporal order modeling are combined, a significant improvement is shown on the Weizmann database over STIP-based SVM recognition.

References

  1. Abdelkader, M. F., Roy-Chowdhury, A. K., Chellappa, R., and Akdemir, U. (2008). Activity representation using 3d shape models. J. Image Video Process., 2008:5:1- 5:16.
  2. Atine, J.-C. (2004). People action recognition in image sequences using a 3d articulated object. In ICIAR (1), pages 769-777.
  3. Bay, H., Tuytelaars, T., and Gool, L. J. V. (2006). Surf: Speeded up robust features. In ECCV (1)7806, pages 404-417.
  4. Blank, M., Gorelick, L., Shechtman, E., Irani, M., and Basri, R. (2005). Actions as space-time shapes. In The Tenth IEEE International Conference on Computer Vision (ICCV'05), pages 1395-1402.
  5. Bobick, A. F. and Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell., 23:257-267.
  6. Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01, CVPR 7805, pages 886-893, Washington, DC, USA. IEEE Computer Society.
  7. Dollár, P. (2007). Piotr's Image and Video Matlab Toolbox (PMT).
  8. Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In Proceedings of the 14th International Conference on Computer Communications and Networks, pages 65-72, Washington, DC, USA. IEEE Computer Society.
  9. Elgammal, A. M., Harwood, D., and Davis, L. S. (2000). Non-parametric model for background subtraction. In Proceedings of the 6th European Conference on Computer Vision-Part II, ECCV 7800, pages 751-767, London, UK. Springer-Verlag.
  10. Gorelick, L., Blank, M., Shechtman, E., Irani, M., and Basri, R. (2007). Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence, 29(12):2247-2253.
  11. Harris, C. and Stephens, M. (1988). A combined corner and edge detection. In Proceedings of The Fourth Alvey Vision Conference, pages 147-151.
  12. Huang, F. and Xu, G. (2007). Viewpoint insensitive action recognition using envelop shape. In Proceedings of the 8th Asian conference on Computer vision - Volume Part II, ACCV'07, pages 477-486, Berlin, Heidelberg. Springer-Verlag.
  13. ?Ikizler, N. and Duygulu, P. (2007). Human action recognition using distribution of oriented rectangular patches. In Proceedings of the 2nd conference on Human motion: understanding, modeling, capture and animation, pages 271-284, Berlin, Heidelberg. SpringerVerlag.
  14. Kläser, A. (2010). Learning human actions in video. PhD thesis, Université de Grenoble.
  15. Kläser, A., Marszalek, M., and Schmid, C. (2008). A spatiotemporal descriptor based on 3d-gradients. In British Machine Vision Conference, pages 995-1004.
  16. Laptev, I. and Lindeberg, T. (2003). Space-time interest points. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV 7803, pages 432-, Washington, DC, USA. IEEE Computer Society.
  17. Laptev, I., Marszalek, M., Schmid, C., and Rozenfeld, B. (2008). Learning realistic human actions from movies. In Conference on Computer Vision & Pattern Recognition.
  18. Lindeberg, T. (1998). Feature detection with automatic scale selection. Int. J. Comput. Vision, 30:79-116.
  19. Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th international joint conference on Artificial intelligence - Volume 2, pages 674-679, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  20. Moon, H. and Chellappa, R. (2008). 3d shape-encoded particle filter for object tracking and its application to human body tracking. J. Image Video Process., 2008:12:1-12:16.
  21. Niebles, J. C., Wang, H., and Fei-Fei, L. (2008). Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vision, 79:299-318.
  22. Pehlivan, S. and Duygulu, P. (2009). 3d human pose search using oriented cylinders. In S3DV09, pages 16-22.
  23. Ramanan, D. and Forsyth, D. A. (2003). Automatic annotation of everyday movements. Technical Report UCB/CSD-03-1262, EECS Department, University of California, Berkeley.
  24. Ramanan, D., Forsyth, D. A., and Zisserman, A. (2007). Tracking people by learning their appearance. IEEE Trans. Pattern Anal. Mach. Intell., 29:65-81.
  25. Riemenschneider, H., Donoser, M., and Bischof, H. (2009). Bag of optical flow volumes for image sequence recognition. In BMVC09, pages xx-yy.
  26. Salton, G. and McGill, M. J. (1986). Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
  27. Schuldt, C., Laptev, I., and Caputo, B. (2004). Recognizing human actions: A local svm approach. In Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03, ICPR 7804, pages 32-36, Washington, DC, USA. IEEE Computer Society.
  28. Sivic, J. and Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In Proceedings of the International Conference on Computer Vision, volume 2, pages 1470-1477.
  29. Sun, X., Chen, M., and Hauptmann, A. (2009). Action recognition via local descriptors and holistic features. In CVPR4HB09, pages 58-65.
  30. Wang, H., Ullah, M. M., Kläser, A., Laptev, I., and Schmid, C. (2009). Evaluation of local spatio-temporal features for action recognition. In British Machine Vision Conference, page 127.
  31. Wang, L., Geng, X., Leckie, C., and Kotagiri, R. (2008). Moving shape dynamics: A signal processing perspective. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:1-8.
  32. Wang, L. and Suter, D. (2007). Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model. Computer Vision and Pattern Recognition, 2007. CVPR 7807. IEEE Conference on, pages 1-8.
  33. Weinland, D. (2008). Action Representation and Recognition. PhD thesis, INPG.
  34. Weinland, D., Ronfard, R., and Boyer, E. (2011). A survey of vision-based methods for action representation, segmentation and recognition. Comput. Vis. Image Underst., 115:224-241.
  35. Willems, G., Tuytelaars, T., and Gool, L. (2008). An efficient dense and scale-invariant spatio-temporal interest point detector. In Proceedings of the 10th European Conference on Computer Vision: Part II, ECCV 7808, pages 650-663, Berlin, Heidelberg. SpringerVerlag.
  36. Zhong, J. and Sclaroff, S. (2003). Segmenting foreground objects from a dynamic textured background via a robust kalman filter. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV 7803, pages 44-, Washington, DC, USA. IEEE Computer Society.
  37. Zivkovic, Z. (2004). Improved adaptive gaussian mixture model for background subtraction. In Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 2 - Volume 02, ICPR 7804, pages 28-31, Washington, DC, USA. IEEE Computer Society.
Download


Paper Citation


in Harvard Style

Ibn Khedher M., A. El-Yacoubi M. and Dorizzi B. (2012). HUMAN ACTION RECOGNITION USING CONTINUOUS HMMS AND HOG/HOF SILHOUETTE REPRESENTATION . In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM, ISBN 978-989-8425-99-7, pages 503-508. DOI: 10.5220/0003695905030508


in Bibtex Style

@conference{icpram12,
author={Mohamed Ibn Khedher and Mounim A. El-Yacoubi and Bernadette Dorizzi},
title={HUMAN ACTION RECOGNITION USING CONTINUOUS HMMS AND HOG/HOF SILHOUETTE REPRESENTATION},
booktitle={Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,},
year={2012},
pages={503-508},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003695905030508},
isbn={978-989-8425-99-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods - Volume 2: ICPRAM,
TI - HUMAN ACTION RECOGNITION USING CONTINUOUS HMMS AND HOG/HOF SILHOUETTE REPRESENTATION
SN - 978-989-8425-99-7
AU - Ibn Khedher M.
AU - A. El-Yacoubi M.
AU - Dorizzi B.
PY - 2012
SP - 503
EP - 508
DO - 10.5220/0003695905030508