
Gracia, I., Suarez, O., Garcia, G., and Kim, T. (2015). Fast
fight detection. PLoS ONE, 10.
Hashemi, M. (2019). Enlarging smaller images before
inputting into convolutional neural network: zero-
padding vs. interpolation. Journal of Big Data, 6:1–
13.
Hassner, T., Itcher, Y., et al. (2012). Violent flows: Real-
time detection of violent crowd behavior. 2012 IEEE
Computer Society Conference on Computer Vision
and Pattern Recognition Workshops, pages 1–6.
Heylen, J., Iven, S., Brabandere, B. d., Oramas, J., Gool,
L. v., and Tuytelaars, T. (2018). From pixels to ac-
tions: Learning to drive a car with deep neural net-
works. 2018 IEEE Winter Conference on Applications
of Computer Vision (WACV), pages 606–615.
Hoffmann, D., Tzionas, D., Black, M., and Tang, S. (2019).
Learning to train with synthetic humans. In GCPR.
Howard, A., Zhu, M., et al. (2017). Mobilenets: Efficient
convolutional neural networks for mobile vision ap-
plications. ArXiv, abs/1704.04861.
Jackson, P., Abarghouei, A., Bonner, S., Breckon, T., and
Obara, B. (2018). Style augmentation: Data augmen-
tation via style randomization. In CVPR Workshops.
Karttunen, J., Kanervisto, A., Hautam
¨
aki, V., and Kyrki, V.
(2019). From video game to real robot: The transfer
between action spaces. ArXiv, abs/1905.00741.
Lee, S. and Kim, E. (2019). Multiple object tracking
via feature pyramid siamese networks. IEEE Access,
7:8181–8194.
Li, X., Zhang, C., and Zhang, D. (2010). Abandoned
objects detection using double illumination invariant
foreground masks. In 2010 20th International Confer-
ence on Pattern Recognition, pages 436–439. IEEE.
Li, Y., Yin, G., Hou, S., Cui, J., and Huang, Z. (2019).
Spatiotemporal feature extraction for pedestrian re-
identification. In Wireless Algorithms, Systems, and
Applications: 14th International Conference.
Mahmoodi, J. and Salajeghe, A. (2019). A classification
method based on optical flow for violence detection.
Expert Syst. Appl., 127:121–127.
Martinez, M., Sitawarin, C., et al. (2017). Beyond grand
theft auto v for training, testing and enhancing deep
learning in self driving cars. ArXiv, abs/1712.01397.
Movshovitz-Attias, Y., Kanade, T., and Sheikh, Y. (2016).
How useful is photo-realistic rendering for visual
learning? ArXiv, abs/1603.08152.
Nadeem, M. S., Franqueira, V. N. L., Kurugollu, F., and
Zhai, X. (2019). Wvd: A new synthetic dataset for
video-based violence detection. In Bramer, M. and
Petridis, M., editors, Artificial Intelligence XXXVI,
pages 158–164.
Nam J., Alghoniemy, M. et al. (1998). Audio-visual
content-based violent scene characterization. Pro-
ceedings 1998 International Conference on Image
Processing. ICIP98 (Cat. No.98CB36269), 1:353–357
vol.1.
Nguyen, N., Phung, D., et al. (2005). Learning and detect-
ing activities from movement trajectories using the hi-
erarchical hidden markov model. 2005 IEEE Com-
puter Society Conference on Computer Vision and
Pattern Recognition (CVPR’05), 2:955–960 vol. 2.
Nievas, E. et al. (2011). Violence detection in video using
computer vision techniques. In CAIP.
Oliver, N., Rosario, B., and Pentland, A. (2000). A bayesian
computer vision system for modeling human inter-
actions. IEEE Trans. Pattern Anal. Mach. Intell.,
22:831–843.
Paulin, G. and Ivasic-Kos, M. (2023). Review and analy-
sis of synthetic dataset generation methods and tech-
niques for application in computer vision. Artificial
Intelligence Review, 56(9):9221–9265.
Persson, P., Espinoza, F., Fagerberg, P., S, A., and C
¨
oster,
R. (2002). Geonotes: A location-based information
system for public spaces. In in Kristina H
¨
o
¨
ok, David
Benyon and Alan Munro (eds), Readings in Social
Navigation of Information Space, pages 151–173.
Rabiee, H. et al. (2018). Detection and localization of crowd
behavior using a novel tracklet-based model. Interna-
tional Journal of Machine Learning and Cybernetics,
9:1999–2010.
Rest, J. v., Roelofs, M., and Nunen, A. v. (2014). Afwijk-
end gedrag maatschappelijk verantwoord waarnemen
van gedrag in context van veiligheid. In TNO 2014
R10987. TNO.
Sajjad, M., Khan, S., et al. (2019a). Cnn-based anti-
spoofing two-tier multi-factor authentication system.
Pattern Recognit. Lett., 126:123–131.
Sajjad, M., Nasir, M., Ullah, F., Muhammad, K., Sangaiah,
A., and Baik, S. (2019b). Raspberry pi assisted facial
expression recognition framework for smart security
in law-enforcement services. Inf. Sci., 479:416–431.
Sargana, A., Angelov, P., and Habib, Z. (2017). A com-
prehensive review on handcrafted and learning-based
action representation approaches for human activity
recognition. Applied Sciences, 7:110.
Serrano, I., D
´
eniz, O., Espinosa-Aranda, J., and Bueno, G.
(2018). Fight recognition in video using hough forests
and 2d convolutional neural network. IEEE Transac-
tions on Image Processing, 27:4787–4797.
Shearer, C. (2000). The crisp-dm model: The new blueprint
for data mining. Journal of Data Warehousing, 5(4).
Tran, D., Bourdev, L., et al. (2015). Learning spatiotem-
poral features with 3d convolutional networks. 2015
IEEE International Conference on Computer Vision
(ICCV), pages 4489–4497.
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., and Baik,
S. (2018). Action recognition in video sequences us-
ing deep bi-directional lstm with cnn features. IEEE
Access, 6:1155–1166.
Ullah, F., Ullah, A., Muhammad, K., Haq, I., and Baik, S.
(2019). Violence detection using spatiotemporal fea-
tures with 3d convolutional neural network. Sensors
(Basel, Switzerland), 19.
Wu, Z., Wang, X., et al. (2015). Modeling spatial-temporal
clues in a hybrid deep learning framework for video
classification. In MM ’15.
Zhou, P., Ding, Q., Luo, H., and Hou, X. (2018). Vio-
lence detection in surveillance video using low-level
features. PLoS ONE, 13.
SECRYPT 2024 - 21st International Conference on Security and Cryptography
174