Hildegard Kuehne, Dirk Gehrig, Tanja Schultz, Rainer Stiefelhagen


The fast and robust recognition of human actions is an important aspect for many video-based applications in the field of human computer interaction and surveillance. Although current recognition algorithms provide more and more advanced results, their usability for on-line applications is still limited. To bridge this gap a online video-based action recognition system is presented that combines histograms of sparse feature point flow with an HMM-based action recognition. The usage of feature point motion is computational more efficient than the more common histograms of optical flow (HoF) by reaching a similar recognition accuracy. For recognition we use low-level action units that are modeled by Hidden-Markov-Models (HMM). They are assembled by a context free grammar to recognize complex activities. The concatenation of small action units to higher level tasks allows the robust recognition of action sequences as well as a continuous on-line evaluation of the ongoing activity. The average runtime is around 34 ms for processing one frame and around 20 ms for calculating one hypothesis for the current action. Assuming that one hypothesis per second is needed, the system can provide a mean capacity of 25 fps. The systems accuracy is compared with state of the art recognition results on a common benchmark dataset as well as with a marker-based recognition system, showing similar results for the given evaluation scenario. The presented approach can be seen as a step towards the on-line evaluation and recognition of human motion directly from video data.


  1. Danafar, S. and Gheissari, N. (2007). Action recognition for surveillance applications using optic flow and svm. In ACCV, volume 2, pages 457-466.
  2. Efros, A. A., Berg, A. C., Mori, G., and Malik, J. (2003). Recognizing action at a distance. In IEEE International Conference on Computer Vision, pages 726- 733, Nice, France.
  3. Finke, M., Geutner, P., Hild, H., Kemp, T., Ries, K., and Westphal, M. (1997). The karlsruhe-verbmobil speech recognition engine. ICASSP-97., 1:83-86.
  4. Gehrig, D., Khne, H., Wrner, A., and Schultz, T. (2009). Hmm-based human motion recognition with optical flow data. In 9th IEEE-RAS International Conference on Humanoid Robots, Humanoids 2009, Paris, France.
  5. Ivanov, Y. A. and Bobick, A. F. (2000). Recognition of visual activities and interactions by stochastic parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:852-872.
  6. Koehler, H. and Woerner, A. (2008). Motion-based feature tracking for articulated motion analysis. In Workshop on Multimodal Interactions Analysis of Users a Controlled Environment, IEEE Int. Conf. on Multimodal Interfaces (ICMI 2008), Chania, Greece.
  7. Lucas, B. D. and Kanade, T. (1981). An iterative image registration technique with an application to stereo vision.
  8. Lucena, M. J., de la Blanca, N. P., Fuertes, J. M., and MarínJiménez, M. J. (2009). Human action recognition using optical flow accumulated local histograms. In Iberian Conf. on Pattern Recognition and Image Analysis, IbPRIA, pages 32-39.
  9. Marszalek, M., Laptev, I., and Schmid, C. (2009). Actions in context. Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 0:2929-2936.
  10. Martinetz, T. and Schulten, K. (1991). A ”neural-gas” network learns topologies. Artificial Neural Networks, 1:397-402.
  11. Mendoza, M. A., Pérez De La Blanca, N., and MarínJiménez, M. J. (2009). Fitting product of hmm to human motions. In Proc. of the 13th Int. Conf. on Computer Analysis of Images and Patterns, CAIP, pages 824-831, Berlin, Heidelberg. Springer-Verlag.
  12. Messing, R., Pal, C., and Kautz, H. (2009). Activity recognition using the velocity histories of tracked keypoints. In ICCV, Washington, DC, USA. IEEE Computer Society.
  13. Shi, J. and Tomasi, C. (1994). Good features to track. Proceedings of the Conference on Computer Vision and Pattern Recognition, pages 593-600.
  14. Soltau, H., Metze, F., F ügen, C., and Waibel, A. (2001). A one-pass decoder based on polymorphic linguistic context assignment. ASRU, pages 214-217.
  15. Tomasi, C. and Kanade, T. (1991). Detection and tracking of point features. Technical report, International Journal of Computer Vision.

Paper Citation

in Harvard Style

Kuehne H., Gehrig D., Schultz T. and Stiefelhagen R. (2012). ON-LINE ACTION RECOGNITION FROM SPARSE FEATURE FLOW . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012) ISBN 978-989-8565-03-7, pages 634-639. DOI: 10.5220/0003861506340639

in Bibtex Style

author={Hildegard Kuehne and Dirk Gehrig and Tanja Schultz and Rainer Stiefelhagen},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)},

in EndNote Style

JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2012)
SN - 978-989-8565-03-7
AU - Kuehne H.
AU - Gehrig D.
AU - Schultz T.
AU - Stiefelhagen R.
PY - 2012
SP - 634
EP - 639
DO - 10.5220/0003861506340639