
sive and detailed annotations for actions such as mak-
ing an arrest, attacks on officers, and suspects flee-
ing, which are integral to an officers’ daily duties.
Although fine-tuning the model on the 2 scenes pro-
duces good results, we see improvement in the recog-
nition performance following the approach of trans-
fer learning. For cross-scene tests, the model gener-
alizes well in the case of training from scene 2 and
testing on scene 1 due to more variations and longer
durations for the actions. TimeSformer outperformed
the traditional models such as C3D, I3D, and Slow-
Fast in recognizing complex law enforcement-related
actions. Future efforts should focus on tailoring the
models to better handle variations between environ-
ments to enhance cross-scene action recognition. Ad-
ditionally, custom architectures or domain-adaptive
layers could be introduced to better capture the con-
textual details of complex law enforcement scenarios,
making the model more robust and capable of gener-
alizing across different environments and action dy-
namics. Apart from action recognition with BWCs,
the application of BWCs for face recognition has also
been explored only to a limited extent, and there is
tremendous potential for advancements in this area.
ACKNOWLEDGEMENTS
This research work is a collaborative study between
UPNM-EURECOM and Naval Group.
REFERENCES
Bertasius, G., Wang, H., and Torresani, L. (2021). Is space-
time attention all you need for video understanding?
In Proceedings of the International Conference on
Machine Learning (ICML).
Bryan, J. (2020). Effects of Movement on Biometric Fa-
cial Recognition in Body-Worn Cameras. PhD thesis,
Purdue University Graduate School.
Carreira, J. and Zisserman, A. (2018). Quo vadis, action
recognition? a new model and the kinetics dataset.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Chakraborty, S., Mondal, R., Singh, P., Sarkar, R., and
Bhattacharjee, D. (2021). Transfer learning with fine
tuning for human action recognition from still images.
Multimedia Tools and Applications, 80.
Chao, X., Hou, Z., and Mo, Y. (2022). Czu-mhad: A multi-
modal dataset for human action recognition utilizing a
depth camera and 10 wearable inertial sensors. IEEE
Sensors Journal, 22:1–1.
Chen, H., Li, H., Song, A., Haberland, M., Akar, O.,
Dhillon, A., Zhou, T., Bertozzi, A. L., and Branting-
ham, P. J. (2019). Semi-supervised first-person activ-
ity recognition in body-worn video. arXiv preprint
arXiv:1904.09062.
Choi, S., Michalski, N. D., and Snyder, J. A. (2023). The
“civilizing” effect of body-worn cameras on police-
civilian interactions: Examining the current evidence,
potential moderators, and methodological limitations.
Criminal Justice Review, 48(1):21–47.
Corso, J. J., Alahi, A., Grauman, K., Hager, G. D., Morency,
L.-P., Sawhney, H., and Sheikh, Y. (2018). Video anal-
ysis for body-worn cameras in law enforcement. arXiv
preprint arXiv:1604.03130.
Damen, D., Doughty, H., Farinella, G., Fidler, S., Furnari,
A., Kazakos, E., Moltisanti, D., Munro, J., Perrett, T.,
Price, W., and Wray, M. (2020). The epic-kitchens
dataset: Collection, challenges and baselines. IEEE
Transactions on Pattern Analysis and Machine Intel-
ligence, PP:1–1.
Duan, H., Zhao, Y., Chen, K., Lin, D., and Dai, B. (2022).
Revisiting skeleton-based action recognition. In 2022
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 2959–2968.
Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2019).
Slowfast networks for video recognition. In 2019
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 6201–6210.
Goyal, R., Kahou, S. E., Michalski, V., Materzynska, J.,
Westphal, S., Kim, H., Haenel, V., Fruend, I., Yian-
ilos, P., Mueller-Freitag, M., Hoppe, F., Thurau, C.,
Bax, I., and Memisevic, R. (2017). The “something
something” video database for learning and evaluat-
ing visual common sense. pages 5843–5851.
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Suk-
thankar, R., and Fei-Fei, L. (2014). Large-scale video
classification with convolutional neural networks. In
CVPR.
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., and Serre,
T. (2011). HMDB: a large video database for human
motion recognition. In Proceedings of the Interna-
tional Conference on Computer Vision (ICCV).
Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., and
Hu, H. (2022). Video swin transformer. In 2022
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 3192–3201.
Majd, M. and Safabakhsh, R. (2020). Correlational con-
volutional lstm for human action recognition. Neuro-
computing, 396:224–229.
Meng, Z., S
´
anchez, J., Morel, J.-M., Bertozzi, A. L., and
Brantingham, P. J. (2018). Ego-motion classification
for body-worn videos. In Tai, X.-C., Bae, E., and
Lysaker, M., editors, Imaging, Vision and Learning
Based on Optimization and PDEs, pages 221–239,
Cham. Springer International Publishing.
N
´
u
˜
nez-Marcos, A., Azkune, G., and Arganda-Carreras, I.
(2022). Egocentric vision-based action recognition:
A survey. Neurocomputing, 472:175–197.
Soomro, K., Zamir, A., and Shah, M. (2012). Ucf101: A
dataset of 101 human actions classes from videos in
the wild. CoRR.
Suat Cubukcu, Nusret Sahin, E. T. and Topalli, V. (2023).
The effect of body-worn cameras on the adjudication
Action Recognition in Law Enforcement: A Novel Dataset from Body Worn Cameras
611