Authors:
Naman Agarwal
and
James Pope
Affiliation:
Intelligent Systems Laboratory, School of Engineering Mathematics and Technology, University of Bristol, Bristol, U.K.
Keyword(s):
Adversarial Machine Learning, Privacy-Preserving Image Classification, Genetic Algorithms, Gradient-Based Fine-Tuning, Black-Box Attack.
Abstract:
Adversarial attacks pose a critical threat to the reliability of machine learning models, potentially undermining trust in practical applications. As machine learning models find deployment in vital domains like au-tonomous vehicles, healthcare, and finance, they become susceptible to adversarial examples—crafted inputs that induce erroneous high-confidence predictions. These attacks fall into two main categories: white-box, with full knowledge of model architecture, and black-box, with limited or no access to internal details. This paper introduces a novel approach for targeted adversarial attacks in black-box scenarios. By combining genetic algorithms and gradient-based fine-tuning, our method efficiently explores input space for perturbations without requiring access to internal model details. Subsequently, gradient-based fine-tuning optimizes these perturbations, aligning them with the target model’s decision boundary. This dual strategy aims to evolve perturbations that effectiv
ely mislead target models while minimizing queries, ensuring stealthy attacks. Results demonstrate the efficacy of GenGradAttack, achieving a remarkable 95.06% Adversarial Success Rate (ASR) on MNIST with a median query count of 556. In contrast, conventional GenAttack achieved 100% ASR but required significantly more queries. When applied to InceptionV3 and Ens4AdvInceptionV3 on ImageNet, GenGradAttack outperformed GenAttack with 100% and 96% ASR, respectively, and fewer median queries. These results highlight the efficiency and effectiveness of our approach in generating adversarial examples with reduced query counts, advancing our understanding of adversarial vulnerabilities in practical contexts.
(More)