Fast Violence Detection in Video
Oscar Deniz, Ismael Serrano, Gloria Bueno, Tae-Kyun Kim
2014
Abstract
Whereas the action recognition problem has become a hot topic within computer vision, the detection of fights or in general aggressive behavior has been comparatively less studied. Such capability may be extremely useful in some video surveillance scenarios like in prisons, psychiatric centers or even in camera phones. Recent work has considered the well-known Bag-of-Words framework often used in generic action recognition for the specific problem of fight detection. Under this framework, spatio-temporal features are extracted from the video sequences and used for classification. Despite encouraging results in which near 90% accuracy rates were achieved, the computational cost of extracting such features is prohibitive for practical applications, particularly in surveillance and media rating systems. The task of violence detection may have, however, specific features that can be leveraged. Inspired by results that suggest that kinematic features alone are discriminant for specific actions, this work proposes a novel method which uses extreme acceleration patterns as the main feature. These extreme accelerations are efficiently estimated by applying the Radon transform to the power spectrum of consecutive frames. Experiments show that accuracy improvements of up to 12% are achieved with respect to state-of-the-art action recognition methods. Most importantly, the proposed method is at least 15 times faster.
References
- Barlow, H. B. and Olshausen, B. A. (2004). Convergent evidence for the visual analysis of optic flow through anisotropic attenuation of high spatial frequencies. Journal of Vision, 4(6):415-426.
- Bermejo, E., Deniz, O., Bueno, G., and Sukthankar, R. (2011). Violence detection in video using computer vision techniques. In 14th Int. Congress on Computer Analysis of Images and Patterns, pages 332-339.
- Blake, R. and Shiffrar, M. (2007). Perception of Human Motion. Annual Review of Psychology, 58(1):47-73.
- Bobick, A. and Davis, J. (1996). An appearance-based representation of action. In Pattern Recognition, 1996., Proceedings of the 13th International Conference on, volume 1, pages 307-312 vol.1.
- Castellano, G., Villalba, S., and Camurri, A. (2007). Recognising human emotions from body movement and gesture dynamics. In Paiva, A., Prada, R., and Picard, R., editors, Affective Computing and Intelligent Interaction, volume 4738 of Lecture Notes in Computer Science, pages 71-82. Springer Berlin Heidelberg.
- Chen, D., Wactlar, H., Chen, M., Gao, C., Bharucha, A., and Hauptmann, A. (2008). Recognition of aggressive human behavior using binary local motion descriptors. In Engineering in Medicine and Biology Society, pages 5238-5241.
- Chen, L.-H., Su, C.-W., and Hsu, H.-W. (2011). Violent scene detection in movies. IJPRAI, 25(8):1161-1172.
- Chen, M.-y., Mummert, L., Pillai, P., Hauptmann, A., and Sukthankar, R. (2010). Exploiting multi-level parallelism for low-latency activity recognition in streaming video. In MMSys 7810: Proceedings of the first annual ACM SIGMM conference on Multimedia systems, pages 1-12, New York, NY, USA. ACM.
- Cheng, W.-H., Chu, W.-T., and Wu, J.-L. (2003). Semantic context detection based on hierarchical audio models. In Proceedings of the ACM SIGMM workshop on Multimedia information retrieval, pages 109-115.
- Clarin, C., Dionisio, J., Echavez, M., and Naval, P. C. (2005). DOVE: Detection of movie violence using motion intensity analysis on skin and blood. Technical report, University of the Philippines.
- Clarke, T. J., Bradshaw, M. F., Field, D. T., Hampson, S. E., and Rose, D. (2005). The perception of emotion from body movement in point-light displays of interpersonal dialogue. Perception, 34:1171-1180.
- Datta, A., Shah, M., and Lobo, N. D. V. (2002). Person-onperson violence detection in video data. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 1, pages 433-438.
- Demarty, C., Penet, C., Gravier, G., and Soleymani, M. (2012). MediaEval 2012 affect task: Violent scenes detection in Hollywood movies. In MediaEval 2012 Workshop Proceedings, Pisa, Italy.
- Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., and Theodoridis, S. (2006). Violence content classification using audio features. In Advances in Artificial Intelligence, volume 3955 of Lecture Notes in Computer Science, pages 502-507.
- Giannakopoulos, T., Makris, A., Kosmopoulos, D., Perantonis, S., and Theodoridis, S. (2010). Audio-visual fusion for detecting violent scenes in videos. In 6th Hellenic Conference on AI, SETN 2010, Athens, Greece, May 4-7, 2010. Proceedings, pages 91-100, London, UK. Springer-Verlag.
- Gong, Y., Wang, W., Jiang, S., Huang, Q., and Gao, W. (2008). Detecting violent scenes in movies by auditory and visual cues. In Proceedings of the 9th Pacific Rim Conference on Multimedia, pages 317-326, Berlin, Heidelberg. Springer-Verlag.
- Hidaka, S. (2012). Identifying kinematic cues for action style recognition. In Proceedings of the 34th Annual Conference of the Cognitive Science Society, pages 1679-1684.
- Lin, J. and Wang, W. (2009). Weakly-supervised violence detection in movies with audio and video based cotraining. In Proceedings of the 10th Pacific Rim Conference on Multimedia, pages 930-935, Berlin, Heidelberg. Springer-Verlag.
- Nam, J., Alghoniemy, M., and Tewfik, A. (1998). Audiovisual content-based violent scene characterization. In Proceedings of ICIP, pages 353-357.
- Oshin, O., Gilbert, A., and Bowden, R. (2011). Capturing the relative distribution of features for action recognition. In Automatic Face Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, pages 111-116.
- Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6):976 - 990.
- Saerbeck, M. and Bartneck, C. (2010). Perception of affect elicited by robot motion. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, HRI 7810, pages 53-60, Piscataway, NJ, USA. IEEE Press.
- Soomro, K., Zamir, A., and Shah, M. (2012). UCF101: A dataset of 101 human action classes from videos in the wild. CRCV-TR-12-01. Technical report.
- Wang, D., Zhang, Z., Wang, W., Wang, L., and Tan, T. (2012). Baseline results for violence detection in still images. In AVSS, pages 54-57.
- Zajdel, W., Krijnders, J., Andringa, T., and Gavrila, D. (2007). CASSANDRA: audio-video sensor fusion for aggression detection. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, pages 200-205.
- Zou, X., Wu, O., Wang, Q., Hu, W., and Yang, J. (2012). Multi-modal based violent movies detection in video sharing sites. In IScIDE, pages 347-355.
Paper Citation
in Harvard Style
Deniz O., Serrano I., Bueno G. and Kim T. (2014). Fast Violence Detection in Video . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 478-485. DOI: 10.5220/0004695104780485
in Bibtex Style
@conference{visapp14,
author={Oscar Deniz and Ismael Serrano and Gloria Bueno and Tae-Kyun Kim},
title={Fast Violence Detection in Video},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={478-485},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004695104780485},
isbn={978-989-758-004-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Fast Violence Detection in Video
SN - 978-989-758-004-8
AU - Deniz O.
AU - Serrano I.
AU - Bueno G.
AU - Kim T.
PY - 2014
SP - 478
EP - 485
DO - 10.5220/0004695104780485