(a) Per-frame classification
1 0 0 0 0 0 0 0 0
jack
0
1 0 0 0 0 0 0 0
jump
0 0.25 0 0.75 0 0 0 0 0
run
0 0 0 0
1 0 0 0 0
side
0 0 0 0 0
1 0 0 0
walk
(b) Per-video classification
Figure 8: Confusion matrices of the Part Labels method.
patches which are included in the global feature.
The performance of the Part Labels method is still
slightly better than the performance of the root model.
This is because the Part Labels method use the part
labels in addition to the global feature. In the con-
fusion matrix of per-video Part Labels classification,
”wave1” is not misclassified as ”bend”. Even though
the global features of these two actions are similar,
their part labels are different, as we can see from Fig-
ure 5. Using this information in the Part Labels model
helps to distinguish them from each other.
5 CONCLUDING REMARKS
This paper introduces a new method for action recog-
nition called the Part Labels method which finds the
best assignment of part labels for each image using
the model parameters trained by HCRF. By analysing
the root model, HCRF, Multi-class SVM, MMHCRF
and the newly proposed Part Labels method on a
benchmark dataset for human actions, we noticed that
the performance of simpler models (the root model
and the multi-class SVM) is comparable to the more
complex models (HCRF and Part Labels). This is be-
cause both HCRF and Part Labels only model the spa-
tial structure, and neglects the temporal structure over
frames. For challenging tasks such as action recogni-
tion, the spatial structure changes over time and be-
comes too complex to model.
A natural extension of our work is to include the
temporal information. This could be done by includ-
ing the temporal information in spatio-temporal fea-
tures or by directly modelling the temporal structure
among frames.
ACKNOWLEDGEMENTS
This work was supported by the EU FP7 Marie Curie
Network iCareNet under grant number 264738 and
the Dutch national program COMMIT.
REFERENCES
Blank, M., Gorelick, L., Shechtman, E., Irani, M., and
Basri, R. (2005). Actions as space-time shapes. In
ICCV’05.
Byrd, R., Nocedal, J., and Schnabel, R. (1994). Represen-
tations of quasi-newton matrices and their use in lim-
ited memory methods. Mathematical Programming,
63:129–156.
Crammer, K. and Singer, Y. (2002). On the algorithmic
implementation of multiclass kernel-based vector ma-
chines. The Journal of Machine Learning Research,
2:265–292.
Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recog-
nizing action at a distance. In ICCV’03.
Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008).
A discriminatively trained, multiscale, deformable
part model. In CVPR’08.
Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007). A
biologically inspired system for action recognition. In
ICCV’07.
Kumar, S. and Hebert, M. (2003). Discriminative random
fields: A discriminative framework for contextual in-
teraction in classification. In ICCV’03.
Niebles, J. and Fei-Fei, L. (2007). A hierarchical model of
shape and appearance for human action classification.
In CVPR’07.
Quattoni, A., Collins, M., and Darrell, T. (2004). Con-
ditional random fields for object recognition. In
NIPS’04.
Quattoni, A., Wang, S., Morency, L.-P., Collinsl, M., and
Darrell, T. (2007). Hidden conditional random fields.
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29(10):1848–1852.
Scovanner, P., Ali, S., and Shah, M. (2007). A 3-
dimensional SIFT descriptor and its application to ac-
tion recognition. In Proc. of the 15th international
conference on Multimedia.
Wang, Y. and Mori, G. (2011). Hidden part models for
human action recognition: Probabilistic versus max-
margin. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 33(7):1310–1323.
Yamato, J., Ohya, J., and Ishii, K. (1992). Recognizing
human action in time-sequential images using hidden
markov model. In CVPR’92.
Yedidia, J., Freeman, W., and Weiss, Y. (2003). Understand-
ing belief propagation and its generalizations. In Ex-
ploring artificial intelligence in the new millennium,
pages 239–269.
HiddenConditionalRandomFieldsforActionRecognition
247