
black-box attacks when generating primary adversar-
ial images on YOLOv8x. As the original motivation
of our work, we propose this method to expose the
vulnerabilities in neural networks and facilitate build-
ing more reliable detection models under adversary
attacks. However, we reserve the task of improving
the model’s robustness for future works. Upon so-
cial goods, we also make our source code available
to encourage others to build defense methods for this
attack method.
REFERENCES
Alaifari, R., Alberti, G. S., and Gauksson, T. (2018). Adef:
an iterative algorithm to construct adversarial defor-
mations. In International Conference on Learning
Representations.
Bochkovskiy, A., Wang, C.-Y., and Liao, H.-Y. M. (2020).
Yolov4: Optimal speed and accuracy of object detec-
tion.
Carlini, N. and Wagner, D. (2017). Towards evaluating the
robustness of neural networks. In 2017 ieee sympo-
sium on security and privacy (sp), pages 39–57. Ieee.
Chen, P.-Y., Sharma, Y., Zhang, H., Yi, J., and Hsieh, C.-J.
(2018). Ead: elastic-net attacks to deep neural net-
works via adversarial examples. In Proceedings of the
AAAI conference on artificial intelligence, volume 32.
Dang, T., Nguyen, K., and Huber, M. (2023). Multipla-
nar self-calibration for mobile cobot 3d object manip-
ulation using 2d detectors and depth estimation. In
2023 IEEE/RSJ International Conference on Intelli-
gent Robots and Systems (IROS), pages 1782–1788.
IEEE.
Dang, T., Nguyen, K., and Huber, M. (2024). V3d-
slam: Robust rgb-d slam in dynamic environments
with 3d semantic geometry voting. arXiv preprint
arXiv:2410.12068.
Du, A., Chen, B., Chin, T.-J., Law, Y. W., Sasdelli, M.,
Rajasegaran, R., and Campbell, D. (2022). Physical
Adversarial Attacks on an Aerial Imagery Object De-
tector. pages 1796–1806.
Everingham, M., Eslami, S. A., Van Gool, L., Williams,
C. K., Winn, J., and Zisserman, A. (2015). The pascal
visual object classes challenge: A retrospective. In-
ternational journal of computer vision, 111:98–136.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-
plaining and harnessing adversarial examples. In Ben-
gio, Y. and LeCun, Y., editors, 3rd International Con-
ference on Learning Representations, ICLR 2015, San
Diego, CA, USA, May 7-9, 2015, Conference Track
Proceedings.
Im Choi, J. and Tian, Q. (2022). Adversarial attack and
defense of yolo detectors in autonomous driving sce-
narios. In 2022 IEEE Intelligent Vehicles Symposium
(IV), pages 1011–1017. IEEE.
Jocher, G., Chaurasia, A., and Qiu, J. (2023). YOLO by
Ultralytics.
Kurakin, A., Goodfellow, I. J., and Bengio, S. (2018). Ad-
versarial examples in the physical world. In Artificial
intelligence safety and security, pages 99–112. Chap-
man and Hall/CRC.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017). Focal loss for dense object detection. In
Proceedings of the IEEE international conference on
computer vision, pages 2980–2988.
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P.,
Ramanan, D., Doll
´
ar, P., and Zitnick, C. L. (2014).
Microsoft coco: Common objects in context. In Com-
puter Vision–ECCV 2014: 13th European Confer-
ence, Zurich, Switzerland, September 6-12, 2014, Pro-
ceedings, Part V 13, pages 740–755. Springer.
Lindeberg, T. (2012). Scale invariant feature transform.
Liu, X., Yang, H., Liu, Z., Song, L., Chen, Y., and Li, H.
(2019). DPATCH: an adversarial patch attack on ob-
ject detectors. In Espinoza, H., h
´
Eigeartaigh, S.
´
O.,
Huang, X., Hern
´
andez-Orallo, J., and Castillo-Effen,
M., editors, Workshop on Artificial Intelligence Safety
2019 co-located with the Thirty-Third AAAI Confer-
ence on Artificial Intelligence 2019 (AAAI-19), Hon-
olulu, Hawaii, January 27, 2019, volume 2301 of
CEUR Workshop Proceedings. CEUR-WS.org.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierar-
chical vision transformer using shifted windows. In
Proceedings of the IEEE/CVF international confer-
ence on computer vision, pages 10012–10022.
Lu, J., Sibai, H., and Fabry, E. (2017). Adversarial Exam-
ples that Fool Detectors. arXiv:1712.02494 [cs].
Lu, Y. (2019). The Level Weighted Structural Similar-
ity Loss: A Step Away from MSE. Proceedings
of the AAAI Conference on Artificial Intelligence,
33(01):9989–9990. Number: 01.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and
Vladu, A. (2018). Towards deep learning models
resistant to adversarial attacks. In 6th International
Conference on Learning Representations, ICLR 2018,
Vancouver, BC, Canada, April 30 - May 3, 2018, Con-
ference Track Proceedings. OpenReview.net.
Moosavi-Dezfooli, S.-M., Fawzi, A., and Frossard, P.
(2016). DeepFool: A Simple and Accurate Method
to Fool Deep Neural Networks. pages 2574–2582.
Nguyen, K., Dang, T., and Huber, M. (2024a). Real-time 3d
semantic scene perception for egocentric robots with
binocular vision. arXiv preprint arXiv:2402.11872.
Nguyen, K., Dang, T., and Huber, M. (2024b). Volumetric
mapping with panoptic refinement using kernel den-
sity estimation for mobile robots.
Puccetti, T., Zoppi, T., and Ceccarelli, A. (2023). On the ef-
ficacy of metrics to describe adversarial attacks. arXiv
preprint arXiv:2301.13028.
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster
r-cnn: Towards real-time object detection with region
proposal networks. Advances in neural information
processing systems, 28.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G.
(2011). Orb: An efficient alternative to sift or surf.
Distortion-Aware Adversarial Attacks on Bounding Boxes of Object Detectors
37