0
100
200
300
400
500
600
700
800
Mean runtime per frame for all tasks
ms/frame
Roll
Pour
Slice
Grind
Sweep
Grate
Stir
Saw
Cut
Mash
OpenCV HoOF
JavaCU HoFF
OpenCV HoFF
Figure 5: Runtime for optical flow compared to feature
based system.
optical flow histogram calculation takes around 764
ms and the openCV based implementation of feature
flow histograms needs 34 ms. It is constant for any
type of sequence as can be seen in Figure 5. The run-
time for the decoding ranges between 20 and 35 ms. It
is done by beam search over all possible action units
giving a hypothesis of the current action unit as well
as the history of action units and type of sequence that
has been performed. This leads to an over all process-
ing time of the system of 25fps, which can be seen as
acceptable for on-line recognition.
7 CONCLUSIONS
In this paper a system for the on-line recognition of
human actions is presented. The video based action
recognition techniques are qualified for the recogni-
tion of sequences of action units and complex activi-
ties. The combination of feature flow histograms and
HMMs enables an on-line action recognition system
to recognize human activities during their execution
in a natural, unrestricted scenario. We see this as a
valuable step towards an on-line action recognition
that allows to adapt to the user and its needs while
still being robust and scalable enough to work in a
real live environment.
ACKNOWLEDGEMENTS
We thank the Insitute for Sports and Sport Science,
Karlsruhe Institute of Technology (KIT), Germany
for recording the marker data used in this work. This
work was partially supported by the German Research
Foundation (DFG) within the Collaborative Research
Center SFB 588 on Humanoid Robots - Learning
and Cooperating Multimodal Robots and by OSEO,
French State agency for innovation, as part of the
Quaero Programme.
REFERENCES
Danafar, S. and Gheissari, N. (2007). Action recognition for
surveillance applications using optic flow and svm. In
ACCV, volume 2, pages 457–466.
Efros, A. A., Berg, A. C., Mori, G., and Malik, J. (2003).
Recognizing action at a distance. In IEEE Interna-
tional Conference on Computer Vision, pages 726–
733, Nice, France.
Finke, M., Geutner, P., Hild, H., Kemp, T., Ries, K., and
Westphal, M. (1997). The karlsruhe-verbmobil speech
recognition engine. ICASSP-97., 1:83–86.
Gehrig, D., Khne, H., Wrner, A., and Schultz, T. (2009).
Hmm-based human motion recognition with optical
flow data. In 9th IEEE-RAS International Confer-
ence on Humanoid Robots, Humanoids 2009, Paris,
France.
Ivanov, Y. A. and Bobick, A. F. (2000). Recognition of
visual activities and interactions by stochastic parsing.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 22:852–872.
Koehler, H. and Woerner, A. (2008). Motion-based feature
tracking for articulated motion analysis. In Workshop
on Multimodal Interactions Analysis of Users a Con-
trolled Environment, IEEE Int. Conf. on Multimodal
Interfaces (ICMI 2008), Chania, Greece.
Lucas, B. D. and Kanade, T. (1981). An iterative image
registration technique with an application to stereo vi-
sion.
Lucena, M. J., de la Blanca, N. P., Fuertes, J. M., and Mar´ın-
Jim´enez, M. J. (2009). Human action recognition
using optical flow accumulated local histograms. In
Iberian Conf. on Pattern Recognition and Image Anal-
ysis, IbPRIA, pages 32–39.
Marszalek, M., Laptev, I., and Schmid, C. (2009). Actions
in context. Computer Vision and Pattern Recognition,
IEEE Computer Society Conference on, 0:2929–2936.
Martinetz, T. and Schulten, K. (1991). A ”neural-gas” net-
work learns topologies. Artificial Neural Networks,
1:397–402.
Mendoza, M. A., P´erez De La Blanca, N., and Mar´ın-
Jim´enez, M. J. (2009). Fitting product of hmm to hu-
man motions. In Proc. of the 13th Int. Conf. on Com-
puter Analysis of Images and Patterns, CAIP, pages
824–831, Berlin, Heidelberg. Springer-Verlag.
Messing, R., Pal, C., and Kautz, H. (2009). Activity
recognition using the velocity histories of tracked key-
points. In ICCV, Washington, DC, USA. IEEE Com-
puter Society.
Shi, J. and Tomasi, C. (1994). Good features to track. Pro-
ceedings of the Conference on Computer Vision and
Pattern Recognition, pages 593–600.
Soltau, H., Metze, F., F¨ugen, C., and Waibel, A. (2001).
A one-pass decoder based on polymorphic linguistic
context assignment. ASRU, pages 214–217.
Tomasi, C. and Kanade, T. (1991). Detection and tracking
of point features. Technical report, International Jour-
nal of Computer Vision.
ON-LINE ACTION RECOGNITION FROM SPARSE FEATURE FLOW
639