presented in (Goudelis et al., 2013) we have used Lin-
ear Discriminant Analysis (LDA) and a vector of 31
features. This latter method requires a previous sil-
houette extraction stage.
Table 2 shows the comparative results for all these
methods using the Weizmann sequences. In our ap-
proach we have used the R
max
transform since it was
the one that gave higher accuracy in the former exper-
iment.
Table 2: Recognition rate (%) on the Weizmann dataset.
Method %
(Scovanner, 2007) 84.2
(Kl
¨
aser et al., 2008) 84.3
(Niebles et al., 2008) 90
(Jhuang et al., 2007) 98.8
(Vishwakarma et al., 2015) 96.64
(Goudelis et al., 2013) 93.4
Ours (using R
max
) 98.8
The mean computational time for recognizing an
action sequence of 100 frames of 160x120 pixels is
900ms. It has been computed using a 3.1GHZ i3 Intel
Core. For these preliminary tests, code is not opti-
mized and the whole process has been implemented
using Matlab.
5 CONCLUSIONS
Template based approaches allow to project a whole
sequence into a single image. In this paper we have
presented a generalized form R
f
of the Radon trans-
form for projecting the action sequence. Choosing the
correct projection function f , it can be adapted to a
concrete problem.
We have tested three different f functions to
project the Radon transform, namely, mean, standard
deviation and supremum, and applied these trans-
forms to the optical flow components of a video se-
quence. This experiment has shown that the R
max
transform gives the highest recognition rate for ac-
tion recognition, higher than the standard R trans-
form, and the other projection functions.
The results obtained in a second experiment also
show that the use of such transforms is a very promis-
ing technique since it yielded higher recognition rates
than the state-of-art methods using the same dataset,
achieving a 98.8 % recognition rate.
6 FURTHER WORK
The results presented in this paper have been obtained
using the Weizmann dataset as testbed. Next experi-
ments will involve other popular action/gesture recog-
nition datasets like KTH (Schuldt et al., 2004) dataset,
and Cambridge hand-gesture data set.
We are currently working on the extension of this
technique to action segmentation. Standard action
recognition datasets used by most researches, usually
contain a set of single actions that start at the begin-
ning of the sequence and stop at the end. In a real ap-
plication actions should be detected, segmented, and
finally, recognized. The use of the R
f
transforms is a
promising technique for sequence segmentation too.
ACKNOWLEDGEMENTS
This work was supported by the Spanish Ministry
of Science and Innovation, project DPI2016-78957-
R, and AEROARMS European project H2020-ICT-
2014-1-644271
REFERENCES
Arodz, T. (2005). Invariant object recognition using radon-
based transform. Computers and Artificial Intelli-
gence, 24:183–199.
Blank, M., Gorelick, L., Shechtman, E., Irani, M., and
Basri, R. (2005). Actions as space-time shapes.
Computer Vision, IEEE International Conference on,
2:1395–1402 Vol. 2.
Bosch, A., Zisserman, A., and Munoz, X. (2007). Image
classification using random forests and ferns. Com-
puter Vision, 2007. ICCV 2007. IEEE 11th Interna-
tional Conference on, pages 1–8.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and
Bray, C. A. (2004). Visual categorization with bags of
keypoints. pages 1–22.
Goudelis, G., Karpouzis, K., and Kollias, S. (2013). Explor-
ing trace transform for robust human action recogni-
tion. Pattern Recognition, 46(12):3238 – 3248.
Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007).
A biologically inspired system for action recognition.
Computer Vision, 2007. ICCV 2007. IEEE 11th Inter-
national Conference on, pages 1–8.
Karlsson, S. and Bigun, J. (2012). Lip-motion events anal-
ysis and lip segmentation using optical flow. pages
138–145.
Kl
¨
aser, A., Marszaek, M., and Schmid, C. (2008). A
spatio-temporal descriptor based on 3d-gradients. In
In BMVC08.
Niebles, J., Wang, H., and Fei-Fei, L. (2008). Unsupervised
learning of human action categories using spatial-
temporal words. International Journal of Computer
Vision, 79(3):299–318.
VISAPP 2017 - International Conference on Computer Vision Theory and Applications
270