REFERENCES
Barlow, H. B. and Olshausen, B. A. (2004). Convergent ev-
idence for the visual analysis of optic flow through
anisotropic attenuation of high spatial frequencies.
Journal of Vision, 4(6):415–426.
Bermejo, E., Deniz, O., Bueno, G., and Sukthankar, R.
(2011). Violence detection in video using computer
vision techniques. In 14th Int. Congress on Computer
Analysis of Images and Patterns, pages 332–339.
Blake, R. and Shiffrar, M. (2007). Perception of Human
Motion. Annual Review of Psychology, 58(1):47–73.
Bobick, A. and Davis, J. (1996). An appearance-based rep-
resentation of action. In Pattern Recognition, 1996.,
Proceedings of the 13th International Conference on,
volume 1, pages 307–312 vol.1.
Castellano, G., Villalba, S., and Camurri, A. (2007). Recog-
nising human emotions from body movement and ges-
ture dynamics. In Paiva, A., Prada, R., and Picard, R.,
editors, Affective Computing and Intelligent Interac-
tion, volume 4738 of Lecture Notes in Computer Sci-
ence, pages 71–82. Springer Berlin Heidelberg.
Chen, D., Wactlar, H., Chen, M., Gao, C., Bharucha, A.,
and Hauptmann, A. (2008). Recognition of aggressive
human behavior using binary local motion descrip-
tors. In Engineering in Medicine and Biology Society,
pages 5238–5241.
Chen, L.-H., Su, C.-W., and Hsu, H.-W. (2011). Violent
scene detection in movies. IJPRAI, 25(8):1161–1172.
Chen, M.-y., Mummert, L., Pillai, P., Hauptmann, A., and
Sukthankar, R. (2010). Exploiting multi-level paral-
lelism for low-latency activity recognition in stream-
ing video. In MMSys ’10: Proceedings of the first
annual ACM SIGMM conference on Multimedia sys-
tems, pages 1–12, New York, NY, USA. ACM.
Cheng, W.-H., Chu, W.-T., and Wu, J.-L. (2003). Semantic
context detection based on hierarchical audio models.
In Proceedings of the ACM SIGMM workshop on Mul-
timedia information retrieval, pages 109–115.
Clarin, C., Dionisio, J., Echavez, M., and Naval, P. C.
(2005). DOVE: Detection of movie violence using
motion intensity analysis on skin and blood. Techni-
cal report, University of the Philippines.
Clarke, T. J., Bradshaw, M. F., Field, D. T., Hampson,
S. E., and Rose, D. (2005). The perception of emo-
tion from body movement in point-light displays of
interpersonal dialogue. Perception, 34:1171–1180.
Datta, A., Shah, M., and Lobo, N. D. V. (2002). Person-on-
person violence detection in video data. In Pattern
Recognition, 2002. Proceedings. 16th International
Conference on, volume 1, pages 433–438.
Demarty, C., Penet, C., Gravier, G., and Soleymani, M.
(2012). MediaEval 2012 affect task: Violent scenes
detection in Hollywood movies. In MediaEval 2012
Workshop Proceedings, Pisa, Italy.
Giannakopoulos, T., Kosmopoulos, D., Aristidou, A., and
Theodoridis, S. (2006). Violence content classifica-
tion using audio features. In Advances in Artificial In-
telligence, volume 3955 of Lecture Notes in Computer
Science, pages 502–507.
Giannakopoulos, T., Makris, A., Kosmopoulos, D., Peran-
tonis, S., and Theodoridis, S. (2010). Audio-visual fu-
sion for detecting violent scenes in videos. In 6th Hel-
lenic Conference on AI, SETN 2010, Athens, Greece,
May 4-7, 2010. Proceedings, pages 91–100, London,
UK. Springer-Verlag.
Gong, Y., Wang, W., Jiang, S., Huang, Q., and Gao, W.
(2008). Detecting violent scenes in movies by audi-
tory and visual cues. In Proceedings of the 9th Pa-
cific Rim Conference on Multimedia, pages 317–326,
Berlin, Heidelberg. Springer-Verlag.
Hidaka, S. (2012). Identifying kinematic cues for action
style recognition. In Proceedings of the 34th Annual
Conference of the Cognitive Science Society, pages
1679–1684.
Lin, J. and Wang, W. (2009). Weakly-supervised violence
detection in movies with audio and video based co-
training. In Proceedings of the 10th Pacific Rim Con-
ference on Multimedia, pages 930–935, Berlin, Hei-
delberg. Springer-Verlag.
Nam, J., Alghoniemy, M., and Tewfik, A. (1998). Audio-
visual content-based violent scene characterization. In
Proceedings of ICIP, pages 353–357.
Oshin, O., Gilbert, A., and Bowden, R. (2011). Capturing
the relative distribution of features for action recog-
nition. In Automatic Face Gesture Recognition and
Workshops (FG 2011), 2011 IEEE International Con-
ference on, pages 111–116.
Poppe, R. (2010). A survey on vision-based human action
recognition. Image and Vision Computing, 28(6):976
– 990.
Saerbeck, M. and Bartneck, C. (2010). Perception of affect
elicited by robot motion. In Proceedings of the 5th
ACM/IEEE international conference on Human-robot
interaction, HRI ’10, pages 53–60, Piscataway, NJ,
USA. IEEE Press.
Soomro, K., Zamir, A., and Shah, M. (2012). UCF101: A
dataset of 101 human action classes from videos in the
wild. CRCV-TR-12-01. Technical report.
Wang, D., Zhang, Z., Wang, W., Wang, L., and Tan, T.
(2012). Baseline results for violence detection in still
images. In AVSS, pages 54–57.
Zajdel, W., Krijnders, J., Andringa, T., and Gavrila, D.
(2007). CASSANDRA: audio-video sensor fusion for
aggression detection. In Advanced Video and Signal
Based Surveillance, 2007. AVSS 2007. IEEE Confer-
ence on, pages 200–205.
Zou, X., Wu, O., Wang, Q., Hu, W., and Yang, J. (2012).
Multi-modal based violent movies detection in video
sharing sites. In IScIDE, pages 347–355.
FastViolenceDetectioninVideo
485