Table 5: Accuracy (%) with classifiers NN and SVM (C = 10
4
), RM descriptor PaD, and feature combination Φ
3
and Φ
4
(see
Table 4) on different sets of views (IXMAS dataset) for training and testing.
Training Views : Testing Views
Feat. Comb. O Classifier O 0:0 1:1 2:2 3:3 4:4 2:3 1,2:3 All:All
Φ
3
NN 44.9 49.5 45.5 45.9 47.3 66.4 65.4 66.6
SVM 65.8 63.9 64.5 65.1 61.1 68.9 80.6 78.4
Φ
4
NN 66.3 66.9 69.1 66.6 60.5 79.5 81.1 71.2
SVM 70.5 72.3 73.0 75.4 65.8 68.9 77.0 75.3
SSM (HOG+OF) (Junejo et al., 2011) 77.0 77.3 75.8 71.2 68.8 68.5 N/A 74.6
is required to understand why the proposed system
(descriptors, fusion strategy, or classifier) is somehow
behind the state-of-the-art results, so that it can be
made more discriminative, yet as simple as possible.
Combining features of different nature (such as
shape, motion, and time-contextual information) gen-
erally improves the performance over individual sub-
sets of these features. However, it is observed that
which frame descriptors are chosen and how they are
combined may significantly affect the performance in
a data-dependent way. Consequently, devising an ef-
ficient procedure to select both, a proper subset of
the descriptor parts, and a suitable fusion strategy, is
among the most interesting research possibilities.
ACKNOWLEDGEMENTS
This work is partially supported by the Span-
ish research programme Consolider Ingenio-2010
CSD2007-00018, Fundaci
´
o Caixa-Castell
´
o Bancaixa
(projects P1·1A2010-11 and P1·1B2010-27), and
Generalitat Valenciana (PROMETEO/2010/028).
REFERENCES
Aggarwal, J. K. and Ryoo, M. S. (2011). Human activity
analysis: A review. ACM Comp. Surv., 43(3).
Ali, S., Basharat, A., and Shah, M. (2007). Chaotic invari-
ants for human action recognition. In ICCV.
BenAbdelkader, C., Cutler, R., and Davis, L. S. (2004). Gait
recognition using image self-similarity. EURASIP J.
on Applied Signal Processing, 2004(4).
Brendel, W. and Todorovic, S. (2010). Activities as time
series of human postures. In ECCV, pages 721–734.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: a library
for support vector machines. ACM Transactions on
Intelligent Systems and Technology, 2(3):27:1–27:27.
Cutler, R. and Davis, L. S. (2000). Robust periodic mo-
tion and motion symmetry detection. In CVPR, pages
2615–2622.
Dalal, N. and Triggs, B. (2005). Histograms of oriented
gradients for human detection. In CVPR.
Gaidon, A., Harchaoui, Z., and Schmid, C. (2011a). Ac-
tom sequence models for efficient action detection. In
CVPR, pages 3201–3208.
Gaidon, A., Harchaoui, Z., and Schmid, C. (2011b). A time
series kernel for action recognition. In BMVC.
Gorelick, L., Blank, M., Shechtman, E., Irani, M., and
Basri, R. (2007). Actions as space-time shapes. PAMI,
29(12):2247–2253.
Junejo, I. N., Dexter, E., Laptev, I., and P
´
erez, P. (2011).
View-independent action recognition from temporal
self-similarities. PAMI, 33(1):172–185.
Lan, Z.-z., Bao, L., Yu, S.-I., Liu, W., and Hauptmann,
A. G. (2012). Double fusion for multimedia event de-
tection. In Proc. of the 18th Intl. Conf. on Advances in
Multimedia Modeling, pages 173–185.
Lucena, M. J., de la Blanca, N. P., and Fuertes, J. M. (2012).
Human action recognition based on aggregated lo-
cal motion estimates. Mach. Vis & Apps. (MVA),
23(1):135–150.
Marwan, N., Romano, M. C., Thiel, M., and Kurthss, J.
(2007). Recurrence plots for the analysis of complex
systems. Physics Reports, 438(5–6):237–329.
Matikainen, P., Hebert, M., and Sukthankar, R. (2010).
Representing pairwise spatial and temporal relations
for action recognition. In ECCV, pages 508–521.
Niebles, J. C., Chen, C.-W., and Li, F.-F. (2010). Modeling
temporal structure of decomposable motion segments
for activity classification. In ECCV, pages 392–405.
Schindler, K. and van Gool, L. (2008). Action snippets:
How many frames does human action recognition re-
quire? In CVPR.
Serra-Toro, C. and Traver, V. J. (2011). A new pedestrian
detection descriptor based on the use of spatial recur-
rences. In CAIP, pages 97–104.
Tran, D. and Sorokin, A. (2008). Human activity recogni-
tion with metric learning. In ECCV, pages 548–561.
Tran-Sorokin (2008). Human activ-
ity recognition with metric learning.
http://vision.cs.uiuc.edu/projects/activity.
Weinland, D., Ronfard, R., and Boyer, E. (2006). Free
viewpoint action recognition using motion history vol-
umes. Comp. Vis. & Image Underst. (CVIU), 104(2–
3):249–257.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
276