
Kurakin, A., Goodfellow, I. J., and Bengio, S. ((2018)). Ad-
versarial examples in the physical world. In Artificial
intelligence safety and security, pages 99–112. Chap-
man and Hall/CRC.
Lee, Y. J., Ghosh, J., and Grauman, K. (2012). Discover-
ing important people and objects for egocentric video
summarization. In 2012 IEEE conference on com-
puter vision and pattern recognition, pages 1346–
1353. IEEE.
Li, M., Huang, B., and Tian, G. (2022). A comprehensive
survey on 3d face recognition methods. Engineering
Applications of Artificial Intelligence, 110:104669.
Li, S., Neupane, A., Paul, S., Song, C., Krishnamurthy,
S. V., Chowdhury, A. K. R., and Swami, A. (2018).
Adversarial perturbations against real-time video clas-
sification systems. arXiv preprint arXiv:1807.00458.
Lo, S.-Y. and Patel, V. M. (2021). Multav: Multiplica-
tive adversarial videos. In 2021 17th IEEE Inter-
national Conference on Advanced Video and Signal
Based Surveillance (AVSS), pages 1–6. IEEE.
Mittal, S., Srivastava, S., and Jayanth, J. P. (2022). A sur-
vey of deep learning techniques for underwater image
classification. IEEE Transactions on Neural Networks
and Learning Systems.
Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P.
(2016). Deepfool: a simple and accurate method to
fool deep neural networks. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 2574–2582.
Nasir, I. M., Raza, M., Shah, J. H., Wang, S.-H., Tariq, U.,
and Khan, M. A. (2022). Harednet: A deep learning
based architecture for autonomous video surveillance
by recognizing human actions. Computers and Elec-
trical Engineering, 99:107805.
Paymode, A. S. and Malode, V. B. (2022). Transfer learning
for multi-crop leaf disease image classification using
convolutional neural network vgg. Artificial Intelli-
gence in Agriculture, 6:23–33.
Pham, H. H., Khoudour, L., Crouzil, A., Zegers, P., and Ve-
lastin, S. A. (2022). Video-based human action recog-
nition using deep learning: a review. arXiv preprint
arXiv:2208.03775.
Pony, R., Naeh, I., and Mannor, S. (2021). Over-the-air
adversarial flickering attacks against video recogni-
tion networks. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 515–524.
Sadrizadeh, S., Aghdam, A. D., Dolamic, L., and Frossard,
P. (2023). Targeted adversarial attacks against neural
machine translation. In ICASSP 2023-2023 IEEE In-
ternational Conference on Acoustics, Speech and Sig-
nal Processing (ICASSP), pages 1–5. IEEE.
Soomro, K., Zamir, A. R., and Shah, M. (2012). Ucf101:
A dataset of 101 human actions classes from videos in
the wild. arXiv preprint arXiv:1212.0402.
Sultani, W., Chen, C., and Shah, M. (2018). Real-world
anomaly detection in surveillance videos. In Proceed-
ings of the IEEE conference on computer vision and
pattern recognition, pages 6479–6488.
Surek, G. A. S., Seman, L. O., Stefenon, S. F., Mariani,
V. C., and Coelho, L. d. S. (2023). Video-based human
activity recognition using deep learning approaches.
Sensors, 23(14):6384.
Wan, J., Fu, J., Wang, L., and Yang, Z. (2023). Bounceat-
tack: A query-efficient decision-based adversarial at-
tack by bouncing into the wild. In 2024 IEEE Sym-
posium on Security and Privacy (SP), pages 68–68.
IEEE Computer Society.
Wang, Y., Liu, J., Chang, X., Rodr
´
ıguez, R. J., and Wang, J.
(2022). Di-aa: An interpretable white-box attack for
fooling deep neural networks. Information Sciences,
610:14–32.
Wei, X., Yan, H., and Li, B. (2022). Sparse black-box video
attack with reinforcement learning. International
Journal of Computer Vision, 130(6):1459–1473.
Wei, X., Zhu, J., Yuan, S., and Su, H. (2019). Sparse ad-
versarial perturbations for videos. In Proceedings of
the AAAI Conference on Artificial Intelligence, vol-
ume 33, pages 8973–8980.
Wei, Z., Chen, J., Wei, X., Jiang, L., Chua, T.-S., Zhou, F.,
and Jiang, Y.-G. (2020). Heuristic black-box adversar-
ial attacks on video recognition models. In Proceed-
ings of the AAAI Conference on Artificial Intelligence,
volume 34, pages 12338–12345.
Wu, W., Sun, Z., and Ouyang, W. (2023). Revisiting clas-
sifier: Transferring vision-language models for video
recognition. In Proceedings of the AAAI conference on
artificial intelligence, volume 37, pages 2847–2855.
Zaidi, S. S. A., Ansari, M. S., Aslam, A., Kanwal, N., As-
ghar, M., and Lee, B. (2022). A survey of modern
deep learning based object detection models. Digital
Signal Processing, 126:103514.
Zhang, J., Li, B., Xu, J., Wu, S., Ding, S., Zhang, L., and
Wu, C. (2022). Towards efficient data free black-box
adversarial attack. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 15115–15125.
Zhou, C., Wang, Y.-G., and Zhu, G. (2022). Object-
attentional untargeted adversarial attack. arXiv
preprint arXiv:2210.08472.
QEBB: A Query-Efficient Black-Box Adversarial Attack on Video Recognition Models Based on Unsupervised Key Frame Selection
295