Authors:
Aristeidis Bifis
and
Emmanouil Psarakis
Affiliation:
Computer Engineering & Informatics Department, University of Patras, Patras, Greece
Keyword(s):
Adversarial Defense, Adversarial Training, Neural Network Robustness, Adversarial Robustness, Deep Learning, Convolutional Layers, Null-Space Projection, Range-Space Projection, Orthogonal Projection, PGD, White Box, Feature Manipulation.
Abstract:
Adversarial training is the standard method for improving the robustness of neural networks against adversarial attacks. However, a well-known trade-off exists: while adversarial training increases resilience to perturbations, it often results in a significant reduction in accuracy on clean (unperturbed) data. This compromise leads to models that are more resistant to adversarial attacks but less effective on natural inputs. In this paper, we introduce an extension to adversarial training by applying novel constraints on convolutional layers, that address this trade-off. Specifically, we use orthogonal projections to decompose the learned features into clean signal and adversarial noise, projecting them onto the range and null spaces of the network’s weight matrices. These constraints improve the separation of adversarial noise from useful signals during training, enhancing robustness while preserving the same performance on clean data as adversarial training. Our approach achieves s
ignificant improvements in robust accuracy while maintaining comparable clean accuracy, providing a balanced and effective adversarial defense strategy.
(More)