parency is important in order to establish how well
algorithms work in comparison to others.
This paper has presented a way to classify fighting
situations. Our method gives 96% correct classifica-
tion on the BEHAVE dataset compared to Datta et al.
(Datta et al., 2002) who reported 97% and Cupillard
et al. (Cupillard et al., 2002) who report 95% for de-
tection of fighting situations on other (and separate)
datasets. However our method does not require the
pre segmentation of parts of individuals, foreground
extraction or pre compiled behaviour models. It has
also been demonstrated that it is possible to identify
pre and post-fight situations. Such cases are important
to monitoring situations as intervention before the act
is always preferable.
A hierarchical classifier is useful in many surveil-
lance applications. Using such a structure can visu-
ally show you how the classification algorithm per-
ceives the features which are given to it. This can be
useful as a sanity check to make sure that the method
is grouping things as you expect them to be.
However it is felt the most useful aspect of using
a hierarchical classifier is in the ability to subdivide
behaviours into a finer degree of granularity. For ex-
ample in a surveillance application one may wish to
identify all the fighting situations (as we have done
here) and then obtain further granularity so as to iden-
tify pre and post fight situations as we have shown.
This ability is useful as it can allow a fine tuning of a
surveillance system.
One issue raised here is that of overlapping
classes. It has been shown that when all the fight-
ing classes are combined the accuracy increases. The
question of are the classes truely different or rather
just transitional states between normal and fight-
ign behaviour. To investigate this an unsupervised
method could be used. However it may still be use-
ful to be able to distinguish the point before a fight (
eg before someone got hurt) in order to stop physical
injury occuring.
Future work should seek to improve the classifica-
tion of continuous sequences perhaps by incorporat-
ing temporal models (eg, hidden Markov models) to
improve classification. A further extension would be
to remove the manual tracking component altogether
(although some targets will be temporarily lost), or to
combine individuals into group actions.
ACKNOWLEDGEMENTS
Thanks to Piotr Dollar for kindly making his cuboids
code available. This work is funded by EPSRCs BE-
HAVE project GR/S98146.
REFERENCES
Blunsden, S., Andrade, E., Laghaee, A., and
Fisher, R. (2007). Behave interactions
test case scenarios, epsrc project gr/s98146,
http://groups.inf.ed.ac.uk/vision/behavedata/interacti-
ons/index.html. On Line.
Cupillard, F., Bremond, F., and Thonnat, M. (2002). Group
behavior recognition with multiple cameras. In Sixth
IEEE Workshop on Applications of Computer Vision
(WACV).
Datta, A., Shah, M., and Lobo, N. D. V. (2002). Person-on-
person violence detection in video data. In Proceed-
ings of the 16 th International Conference on Pattern
Recognition (ICPR’02) Volume 1, page 10433. IEEE
Computer Society.
Davis, J. W. and Bobick, A. F. (2001). The representation
and recognition of action using temporal templates. In
IEEE Transactions on Pattern Analysis and Machine
Intelligence, volume 23, pages 257–267. IEEE Com-
puter Society.
Dee, H. and Hogg, D. C. (2004). Is it interesting? com-
paring human and machine judgements on the pets
dataset. Sixth International Workshop on Performance
Evaluation of Tracking And Surveillance, 33(1):49–
55.
Dollar, P., Rabaud, V., Cottrell, G., and Belongie, S. (2005).
Behavior recognition via sparse spatio-temporal fea-
tures. In PETS, pages 65–72, China.
Duda, R., Hart, P. E., and Stork, G. D. (2000). Pattern Clas-
sification, Second Edition. Wiley Interscience, Uni-
versity of Texas at Austin, Austin, USA.
Efros, A., Berg, A., Mori, G., and Malik, J. (2003). Recog-
nising action at a distance. In In 9th International
Conference on Computer Vision, volume 2, pages
726–733.
Freund, Y. and Schapire, R. E. (1996). Game theory, on-line
prediction and boosting. In Ninth Annual Conference
on Computational Learning Theory, pages 325–332.
Niebles, J. C., Wang, H., and FeiFei, L. (2006). Unsu-
pervised learning of human action categories using
spatial-temporal words. In British Machine Vision
Conference, Edinburgh.
project/IST 2001 37540, E. F. C. (2004). found at url:
http://homepages.inf.ed.ac.uk/rbf/caviar/.
Ribeiro, P. and Santos-Victor, J. (2005). Human activi-
ties recognition from video: modeling, feature selec-
tion and classification architecture. In Workshop on
Human Activity Recognition and Modelling (HAREM
2005 - in conjunction with BMVC 2005), pages 61–70,
Oxford.
Troscianko, T., Holmes, A., Stillman, J., Mirmehdi, M.,
and Wright, D. (2004). What happens next? the pre-
dictability of natural behaviour viewed through cctv
cameras. Perception, 33(1):87–101.
Zhu, J., Rosset, S., Zhou, H., and Hastie, T. (2006). Multi-
class adaboost. Technical report, University of Michi-
gan, Ann Arbor.
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
308