OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural
Networks via Orthogonal Projections
Aristeidis Bifis
a
and Emmanouil Psarakis
b
Computer Engineering & Informatics Department, University of Patras, Patras, Greece
Keywords:
Adversarial Defense, Adversarial Training, Neural Network Robustness, Adversarial Robustness, Deep
Learning, Convolutional Layers, Null-Space Projection, Range-Space Projection, Orthogonal Projection,
PGD, White Box, Feature Manipulation.
Abstract:
Adversarial training is the standard method for improving the robustness of neural networks against adversarial
attacks. However, a well-known trade-off exists: while adversarial training increases resilience to perturba-
tions, it often results in a significant reduction in accuracy on clean (unperturbed) data. This compromise
leads to models that are more resistant to adversarial attacks but less effective on natural inputs. In this pa-
per, we introduce an extension to adversarial training by applying novel constraints on convolutional layers,
that address this trade-off. Specifically, we use orthogonal projections to decompose the learned features into
clean signal and adversarial noise, projecting them onto the range and null spaces of the network’s weight
matrices. These constraints improve the separation of adversarial noise from useful signals during training,
enhancing robustness while preserving the same performance on clean data as adversarial training. Our ap-
proach achieves significant improvements in robust accuracy while maintaining comparable clean accuracy,
providing a balanced and effective adversarial defense strategy.
1 INTRODUCTION
Adversarial attacks pose a significant threat to the re-
liability and security of neural networks, particularly
in critical real-world applications such as autonomous
driving, healthcare, and finance (Wu et al., 2023; Sel-
vakkumar et al., 2022; Chen et al., 2021). These at-
tacks, which introduce small but deliberate perturba-
tions to input data, can lead to incorrect predictions
or system failures, undermining the trustworthiness
of AI systems. Although adversarial training has be-
come the standard defense mechanism, it often results
in a trade-off: models become more robust to adver-
sarial perturbations but suffer decreased performance
on clean (unperturbed) data (Tsipras et al., 2018;
Zhang et al., 2019). This trade-off limits the prac-
ticality of adversarially robust models, highlighting
the need for methods that enhance robustness without
sacrificing accuracy on natural inputs. Such methods
are crucial for improving the overall reliability and
usability of AI systems in real-world scenarios.
The standard adversarial training framework
(Goodfellow et al., 2014) seeks to improve the robust-
a
https://orcid.org/0000-0003-0246-1209
b
https://orcid.org/0000-0002-9627-0640
ness of machine learning models by explicitly training
them on adversarial examples—inputs that have been
intentionally perturbed to mislead the model. The
core idea is to augment the training data with these
adversarial examples, forcing the model to learn from
both the original clean data and the perturbations that
challenge its decision boundaries. This process typi-
cally involves generating adversarial examples using
methods like the Fast Gradient Sign Method (FGSM)
(Goodfellow et al., 2014) or Projected Gradient De-
scent (PGD) (M ˛adry et al., 2017) and incorporating
them into the training procedure. The goal is to en-
hance the model’s ability to classify both clean and
adversarial inputs correctly, thereby increasing its re-
silience to attacks. By exposing the model to a diverse
set of adversarial perturbations, adversarial training
helps it develop more stable decision boundaries, ul-
timately improving its generalization and robustness
against malicious attacks.
A significant limitation of existing adversarial
training methods is the trade-off between improv-
ing robustness to adversarial attacks and maintaining
clean accuracy on unperturbed data. While adversar-
ial training helps models become more resilient to ad-
versarial examples by exposing them to perturbations
during training, it often leads to a degradation in clean
Bifis, A. and Psarakis, E.
OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections.
DOI: 10.5220/0013389500003912
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages
889-896
ISBN: 978-989-758-728-3; ISSN: 2184-4321
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.
889
accuracy—the model’s performance on natural, un-
modified inputs. This occurs because the model’s de-
cision boundaries are adjusted to become more robust
to adversarial perturbations, sometimes at the cost of
over fitting to the adversarial examples or losing its
ability to generalize well on normal data. This trade-
off represents a key challenge in adversarial training,
as improving robustness to attacks can compromise
the model’s overall performance, making it less effec-
tive in real-world, unperturbed scenarios. Researchers
are actively exploring ways to mitigate this issue, such
as through more sophisticated loss functions, regular-
ization techniques, or hybrid training strategies that
balance both clean and adversarial accuracy.
This paper introduces a novel extension to the
adversarial training (AT) framework by applying or-
thogonal projection constraints to the clean signal and
adversarial noise representations, mapping them onto
the range and null spaces of the network’s weight ma-
trices. This approach separates the adversarial per-
turbations from the clean data signal during training,
effectively enhancing the model’s robustness to ad-
versarial attacks while preserving its performance on
unperturbed data. By doing so, it addresses the com-
mon trade-off between adversarial defense and clean
accuracy, offering a more balanced and practical so-
lution.
2 RELATED WORK
Adversarial training using Projected Gradient De-
scent (PGD) has become one of the most widely
adopted techniques for improving model robustness
against adversarial attacks. In (M ˛adry et al., 2017),
authors demonstrated the effectiveness of PGD-based
adversarial training, showing that iteratively applying
adversarial perturbations during training could sig-
nificantly improve model performance on adversarial
examples. PGD attacks involve multiple steps of gra-
dient updates, which are projected onto a specified
norm ball to ensure the adversarial perturbations re-
main within a predefined limit. The authors showed
that this method could help train models that are ro-
bust to a variety of adversarial attacks, particularly
white-box attacks like PGD itself, providing a foun-
dation for many subsequent works in adversarial de-
fense (Rade and Moosavi-Dezfooli, 2022; Kumari
et al., 2019; Sitawarin et al., 2021). The approach
has been widely used and adapted to various architec-
tures and datasets, becoming a standard benchmark
for evaluating adversarial robustness. However, while
PGD-based adversarial training is effective, it often
involves a trade-off between robustness and clean data
accuracy, as the process can lead to overfitting on the
adversarial perturbations.
TRADES (TRadeoff-inspired Adversarial DE-
fense via Surrogate-loss minimization) (Zhang et al.,
2019), is a prominent adversarial defense technique
designed to improve the robustness of neural net-
works against adversarial attacks while preserving
their generalization to clean data. TRADES intro-
duces a novel approach to adversarial training by min-
imizing a surrogate loss that balances between ad-
versarial robustness and clean data accuracy. Specif-
ically, it incorporates a trade-off term that penal-
izes the difference in output distributions between
clean and adversarial examples, using the Kullback-
Leibler (KL) divergence to measure this discrepancy.
The method encourages the model to behave simi-
larly on both clean and adversarially perturbed data,
which leads to improved performance against adver-
sarial attacks, especially in terms of transferability
and robustness in black-box settings. One of the key
strengths of TRADES is its ability to improve the
trade-off between adversarial robustness and clean ac-
curacy, addressing a common challenge in adversar-
ial training methods, where boosting one often results
in a decline in the other. TRADES has shown su-
perior performance over standard adversarial training
methods, such as PGD-based training, by achieving
better generalization and robustness. However, de-
spite its effectiveness, TRADES introduces additional
computational overhead and requires careful tuning of
the trade-off parameter to maintain a balance between
adversarial robustness and clean data accuracy. Sub-
sequent research has extended TRADES by explor-
ing alternative regularization techniques and improv-
ing its efficiency in large-scale models (Pang et al.,
2022; Levi and Kontorovich, 2024).
In (Bifis et al., 2023) a novel adversarial defense
strategy that leverages orthogonal constraints applied
to denoising autoencoders (DAEs) was introduced.
The proposed approach demonstrated that tied-weight
DAEs, which have half the complexity of full-weight
models, offer substantial improvements in adversarial
robustness without compromising on computational
efficiency. By enforcing orthogonality during train-
ing, the model becomes more resilient to adversarial
perturbations while maintaining low inference over-
head. Building upon this foundation, we extend that
approch to more complex architectures, specifically
exploring the application of that theory to convolu-
tional layers. Furthermore, we are investigating the
potential of applying the orthogonal constraints out-
side the denoising framework, broadening their appli-
cability to other areas of adversarial defense.
In this paper limitatations of the approach pre-
sented in (Bifis et al., 2023) are addresed; namely:
its focus on applying constraints exclusively to
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
890
fully connected layers in smaller neural networks.
This approach aimed to reduce the number of pa-
rameters while maintaining a comparable level of
robustness to larger, more complex robust net-
works, but it restricted the scope of the technique
in parameter-aware contexts, in addition
its reliance on noise from known distributions to
minimize the impact on training time, earning it
the label of attack-agnostic. However, while the
method performed as intended in black- and gray-
box setups, it failed to achieve the same level
of robustness as adversarial training in white-box
scenarios.
3 PROBLEM FORMULATION
The objective of our technique is to train a neural net-
work capable of substantially mitigating adversarial
noise embedded in input signals. This approach im-
proves classification accuracy on tampered data while
preserving performance on clean data, achieving ac-
curacy comparable to models trained exclusively on
pristine inputs. The first step in extending the above-
mentioned technique (Bifis et al., 2023) is to find out a
way for applying constraints to layer types beyond the
fully connected ones. This shift is necessary because,
in large networks, fully connected layers are less ef-
fective at capturing localized features and are com-
putationally inefficient. A more promising approach
is to adapt the orthogonality constraints for convolu-
tional layers, which are widely used in state of ther art
networks, computationally efficient, and well-suited
for producing localized features. This process dif-
fers fundamentally from the matrix-vector multipli-
cation performed in a network that is constructed by
fully connected layers. Therefore, how can we estab-
lish and extend the theoretical framework presented
in (Bifis et al., 2023) to convolutional layers?
In convolutional layers, their inputs and building
blocks are represented by tensors. A convolutional
layer, depending on the kind of its input can be seen
as:
a feature map generator, if the input is an image
or
a feature map transform, if the input is a feature
map itself.
An input, independently of its kind, typically can
be considered as a C
in
channels image of size H
in
×
W
in
each. For example, a typical RGB image has
C
in
= 3 and a gray-scale C
in
= 1 and they can be stored
in an input tensor of appropriate size. The convo-
lutional layer kernels (filters) can also be stored in
tensors. The number of filters defines the number
of output channels C
out
and each one can be denoted
by an C
in
× H
l
k
×W
l
k
tensor K
c
out
with H
l
k
H
in
and
W
l
k
W
in
. Thus, for a given input image or feature
map X , R
C
in
×H
in
×W
in
from its convolution with the
kernel K
c
out
results the c
out
-th output image or feature
map Y
c
out
of size H
out
×W
out
whose each element is a
RV given by the following relation:
Y
c
out
(n) =
C
in
c
in
=1
mS
kern
X
c
in
(n m)K
c
out
(c
in
,m)
n S
out
, c
out
= 1,·· · ,C
out
(1)
with:
S
kern
=
[0, H
l
k
1] × [0, W
l
k
1]
S
out
=
[0, Hout 1] × [0, Wout 1]
the supports of the c
out
-th kernel, and the output re-
spectively, that can be equivalently written as follow:
y
t
c
out
= k
t
c
out
X
r
, c
out
= 1,·· · ,C
out
(2)
where y
c
out
, k
c
out
the flatten versions of Y
c
out
and K
c
out
of length H
out
W
out
and C
in
H
l
k
W
l
k
respectively and X
r
an appropriate rearrangement of the input whose the
size depends on the setting of the convolutional pa-
rameters, i.e., stride, dilation etc.. Using Eq. (2) the
linear convolution can be expressed as the product of
a deterministic matrix K of size C
out
×C
in
H
l
k
W
l
k
with
a random matrix X
r
of size C
in
H
l
k
W
l
k
× H
out
W
out
as
follow:
Y = KX
r
(3)
with the random matrix Y of size C
out
×H
out
W
out
and
matrix K defined as follow:
Y =
y
t
1
y
t
2
.
.
.
y
t
C
out
and K =
k
t
1
k
t
2
.
.
.
k
t
C
out
. (4)
Having defined the linear convolution as a multiplica-
tion of matrices, let us make some comments about
the specific form of the random matrix Y defined in
Eq. (4), how it depends on the form of matrix K and
how we can apply the desired constraints on the range
and the null space of the filters coefficient matrix K.
3.1 The Proposed Solution
For the purposes of this paper, we model adversar-
ial attacks as the addition of a correlated perturbation
(Goodfellow et al., 2014) to the input X , that is:
X
A
= X + W
A
(5)
In our pipeline, as we can see from Fig. 1, we first
apply a non-linear transformation to the inputs RVs.
OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections
891
Figure 1: Proposed pipeline, consisting of three key components: (1) the non-linear transformation, (2) the N number of
constrained layers, each followed by the output normalization step, where novel defense constraints are applied, and (3) the
WideResNet classifier, which performs the final classification task.
The motivation behind this initial non-linear transfor-
mation is to enhance the data representation, improve
the feature set, and address the complexity of adver-
sarial noise, which follows intricate patterns.
To perform this nonlinear transformation, we use
a simple convolutional layer followed by a non-linear
activation function f (.). Consequently, using Eq. (3)
the pristine data input X and the adversarial input X
A
are non-linearly transformed to representations Z and
Z
A
respectively, as follow:
Z = f
0
(KX
r
) (6)
Z
A
= f
0
(KX
A
r
) (7)
We can still claim that the non-linear representa-
tion relationship for the above mentioned transformed
RVs is:
Z
A
= Z + R (8)
where the term R can be viewed as a residual noise
perturbation that affects the attacked classifier, shift-
ing the representation of the pristine data towards re-
gions that lead to misclassification and incorrect re-
sults. We must stress at this point that the noise pertur-
bation R is correlated with the representation of the
pristine data Z. In order to quantify this dependency,
let us rewrite Eq. (8) in a column-wise manner, i.e.:
z
A
l
= z
l
+ r
l
, l = 1,··· , H
out
W
out
(9)
Then, we have the following proposition.
Proposition 1: Let z
l
, z
A
l
be the non-linear represen-
tations of the pristine and adversarial attacked RVs
respectively. Then, the following relation holds:
z
A
l
= (1 + µ
l
)z
l
+ v
l
, l = 1,··· , H
out
W
out
(10)
with constant µ
l
bounded by unity and defined by:
µ
l
=
< z
A
l
z
l
, z
l
>
||z
l
||
2
||z
A
l
z
l
||
2
with < .,. > denoting the inner product operator, and
the vectorized RV v
l
being orthogonal to z
l
, that is,
< z
l
, v
l
>= 0.
Proof: The proof is easy and thus omitted.
We are going to exploit Proposition 1 in order to
properly define the activation functions of the next
convolutional layers shown in Fig. 1. The output of
these layers can be defined as follow:
Q
k
= f
k
(K
k
Q
k1
), k = 1,2,· ·· ,N (11)
with RV Q
k
being either Z
k
and Q
0
= Z, if the pris-
tine data feed the input of the DNN, or Z
A
k
with
Q
0
= Z
A
if the adversarial one, and f
k
(.), K
k
the ac-
tivation function, that acts in a column-wise manner,
and kernel’s matrix of the k-th convolutional layer re-
spectively.
In order to achieve our goal, let us adopt the
framework proposed in (Bifis et al., 2023); namely
the goal is to produce pristine representations by con-
straining the weights K
k
, k = 1,2,· ·· ,N of kernels in
each convolutional layer of our network to:
project adversarial residual noise perturbation
representations onto the null space of the net’s
weights
while preserving all the information from the
pristine data representations in the range of the
weights.
To this end we focus on finding weights K
k
, k =
1,2,· · · , N for each convolutional layer of the pipeline
shown in Fig. 1 such that the corresponding residual
noise v
k,l
and z
k,l
be orthogonal to specific parts of
the weights matrix. To satisfy these conditions, we
can utilize the null space and the range of the matrix
K
k
. Next lemma gives us the solution to all the above
mentioned requirements.
Lemma 1: Let us consider that the following orthog-
onality constraints:
U
T
R
k
v
k,l
= 0 (12)
U
T
N
k
z
k,l
= 0 (13)
are imposed on the k-th convolutional layer, k =
1,2,· · · , N, of the pipeline shown in Fig. 1 during the
training phase of the network, with U
R
k
, U
N
k
denoting
the range and null space of the K
k
kernel’s weights re-
spectively, that can be obtained from the SVD of the
corresponding matrix K
k
= V
k
Σ
k
U
T
k
. Let us also con-
sider that the following activation function:
f
k
(K
k
q
k,l
) =
K
k
q
k,l
||K
k
q
k,l
||
2
, l = 1,··· , H
out
W
out
(14)
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
892
is acting on the output of the corresponding convo-
lutional layer, with q
k,l
denoting the l-th column of
the random matrix Q
k
defined in Eq. (11). Then, the
pristine and adversarial representations match.
Proof: The proof of Lemma 1 is easy. Note that if
we denote by o
k,l
= K
k
z
k1,l
the output of the kth
convolutional layer to the lth column of the pristine
matrix Z
k1
then, using Proposition 1 we easily ob-
tain the following relation:
K
k
z
A
k,l
= (1 + µ
k,l
)o
k,l
. (15)
By applying the activation function defined in Eq.
(14), we can easily prove the lemma.
Concluding, by following our pipeline and apply-
ing the proposed constraints during training, we ide-
ally would like to acquire a model where, for each
pair of pristine and adversarial data their representa-
tions to match.
3.2 Loss Function
Since we want the whole network be a classifier, we
utilize adversarial training and as the loss function we
propose the use of a cross entropy based one. Specif-
ically:
L(W ) = E
X ,L
[CE(g(X ;W ), L) +CE(g(X
A
;W ),L)]
(16)
where g(.;W ) the output of the whole net, W =
K, {U
R
k
, U
N
k
}
N
k=1
, W
C
the weights of the non-
linear transformation, the N convolutional layers, the
weights of the WideResnet based Classifier and L the
ground truth labels’s set. In addition we would like to
impose the following constraints:
E
V
k
[||U
T
R
k
V
k
||
2
F
] = 0, k = 1,· · · ,N (17)
E
Z
k
[||U
T
N
k
Z
k
||
2
F
] = 0, k = 1,· · · ,N. (18)
with ||X||
2
denoting the Euclidean l
2
norm or Frobe-
nious norm of matrix X. Then, we define the follow-
ing Lagrangian function:
J(W
+
) = L(W ) +
N
k=1
λ
R
k
E
V
k
[||U
T
R
k
V
k
||
2
F
]
+
N
k=1
λ
N
k
E
Z
k
[||U
T
R
k
Z
k
||
2
F
]
or equivalently:
J(W
+
) = L(W ) +
N
k=1
λ
R
k
tr
U
T
R
k
E
V
k
[V
k
V
T
k
]U
R
k
+
N
k=1
λ
N
k
tr
U
T
N
k
E
Z
k
[Z
k
Z
T
k
]U
N
k
(19)
where W
+
= {W,{λ
R
k
,λ
N
k
}
N
k=1
} and tr{A} denoting
the trace of matrix A, and minimize it over the weights
and the Lagrange multipliers λ
R
k
, λ
N
k
, k = 1,·· · , N
of the network, and this concludes the section.
4 EXPERIMENTAL SETUP
All our experiments were conducted, using an
NVIDIA A100 with 40GB of vram. As a backbone
neural network we utilized the WideResNet architec-
ture from (Zagoruyko, 2016). This architecture has
been widely adopted by other researchers for adver-
sarial defense in classification tasks (Bartoldson et al.,
2024; Amini et al., 2024; Peng et al., 2023), has
been proven effective and is frequently used in the
literature, as demonstrated by RobustBench (Croce
et al., 2021), a famous benchmark in the context
of adversarial robustness for adversarial defenses in
the CIFAR-10, CIFAR-100 (Krizhevsky and Hinton,
2009) and ImageNet (Deng et al., 2009) datasets. For
our experiments we used a small version of WideRes-
Net with 10 layers and a widen factor of 2 (namely
WideResNet-10-2). We ran our tests for two datasets,
MNIST (Deng, 2012) & Fashion-MNIST (Xiao et al.,
2017). As learning rate, we used 10
5
for weights
and 1 and 0.01 for lamdas on each dataset respec-
tively. We also tested our theory with different at-
tack hyperparameters. In each network we also added
a non-linear transformation layer in the beginning,
which consisted of an appropriate size convolutional
layer followed by a ReLU. We then added two layers
(N = 2) on which we enforced our constraints during
training.
To perform adversarial training, we used PGD
(M ˛adry et al., 2017). For MNIST, we used 40 PGD
steps with e = 75/255, and a step size of 2/255.
For Fashion-MNIST, we used 10 PGD steps with
e = 8/255, and a step size of 2/255. We evaluated our
trained models under various classical as well as more
recent adversarial attacks, namely FGSM (Goodfel-
low et al., 2014), PGD (M ˛adry et al., 2017), C&W
(Carlini and Wagner, 2017), MIM (Dong et al., 2017),
APGD (Croce and Hein, 2020), APGDT (Croce and
Hein, 2020), FAB (Croce and Hein, 2019), Square
(Andriushchenko et al., 2019), SPSA (Gao et al.,
2020), Jitter (Schwinn et al., 2021), VMIFGSM &
VNIFGSM (Wang and He, 2021) . For the attack
implementations, we utilized the widely-used torchat-
tacks library (Kim, 2020), applying the default pa-
rameters for each attack, as well as using the same
values (where applicable) for e, step size, and number
of steps as in the adversarial training.
5 RESULTS
In this section we compare our constraint results with
the baseline adversarial training (M ˛adry et al., 2017).
Our approach does not necessitate direct comparison
with state-of-the-art defenses, as its primary value
OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections
893
lies in its versatility and lightweight nature. Unlike
many specialized techniques, our method is not con-
fined to specific threat models or architectures; in-
stead, it seamlessly integrates with any convolutional
layer-based system, enhancing its robustness. This
generality distinguishes our approach, as it comple-
ments rather than competes with existing defenses.
Moreover, adversarial training serves as a universal
baseline in this domain due to its ubiquity and es-
tablished effectiveness across diverse models and at-
tack scenarios. By focusing on comparisons with
adversarial training, we highlight the adaptability of
our method while avoiding the pitfalls of narrow,
scenario-specific evaluations that may not reflect its
true potential. This emphasis underscores our contri-
bution as a foundational enhancement to robust learn-
ing, capable of synergizing with state-of-the-art tech-
niques to achieve even greater resilience.
5.1 MNIST Results
We begin by testing our hypothesis using the MNIST
dataset, which consists of 60.000 training images
and 10.000 test images of handwritten digits, span-
ning 10 classes. While defending against adversar-
ial attacks of small magnitude on this dataset is rela-
tively straightforward, particularly through adversar-
ial training, we aim to demonstrate the robustness of
our approach under more challenging conditions. To
this end, we increase the number of iterations for PGD
and use a higher perturbation magnitude ε (compared
to 10 iterations and perturbation magnitude of 8/255
typically used in other datasets) to generate adversar-
ial examples that impose a stronger challenge, which
is common practice in adversarial defense research.
More details can be found in Section 4.
We first compare the performance of our model
on clean, unperturbed data. As shown in Table 1,
the clean accuracies for both nets, the baseline and
the proposed, are comparable. This demonstrates that
the constraint we apply to enforce weight orthogo-
nality does not adversely affect the model’s ability
to correctly classify clean examples. Our method’s
primary contribution lies in improving the robustness
of the model against adversarial examples. The en-
forced orthogonality through our proposed constraint
enhances the model’s defense capability against ad-
versarial perturbations, without negatively impacting
its performance on clean data.
5.1.1 Robustness Against Adversarial Attacks
Next, we evaluate the performance of our trained
models against a variety of adversarial attacks in a
white-box context. The results, presented in Table
Table 1: Classification accuracies of the two compared clas-
sifiers on MNIST.
Classifier Clean Accuracy
WResNet-10-2 99.30%
Constr. WResNet-10-2 (Ours) 99.33%
Table 2: Robust accuracies under white-box attacks for the
two compared classifiers on some typical adversarial attacks
on MNIST.
Attack WResNet Constr. WResNet (Ours)
FGSM 97.06 % 98.74 %
PGD 95.81 % 96.96 %
C&W 98.17 % 98.33 %
MIM 95.77 % 97.06 %
APGD 92.03 % 98.26 %
APGDT 91.97 % 98.22 %
FAB 94.42 % 98.95 %
Square 97.32 % 98.45 %
SPSA 99.20 % 99.32 %
Jitter 97.25 % 98.02 %
VMIFGSM 95.88 % 96.97 %
VNIFGSM 95.78 % 96.82 %
2, clearly demonstrate that our method outperforms
the baseline by nearly 2% average across all attack
types, reaching up to 6% at multiple attacks. We must
stress at this point that this improvement is achieved
solely by imposing our orthogonality constraint dur-
ing training—without altering the network’s architec-
ture or introducing additional computational overhead
during inference.
In other words, our proposed constraint enhances
the model’s robustness without causing overfitting to
any specific attack, such as PGD, and generalizes well
to other adversarial attacks. This highlights the versa-
tility and effectiveness of our approach in strength-
ening the model’s resistance to adversarial perturba-
tions, marking a significant contribution to the field
of adversarial defense.
In summary, our method showcases a simple yet
powerful enhancement to adversarial training, im-
proving robustness across a variety of attacks while
maintaining similar performance on clean data and
avoiding the typical trade-offs associated with more
complex defense mechanisms.
5.2 Fashion-MNIST Results
We extend our evaluation to the Fashion-MNIST
dataset, which consists of 60.000 training images and
10.000 test images representing 10 classes of cloth-
ing items. The complexity of Fashion-MNIST lies
in the similarity between certain classes (e.g., t-shirts
vs. pullovers, trousers vs. dresses), making it a
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
894
Table 3: Classification accuracies of the two compared clas-
sifiers on Fashion-MNIST.
Classifier Clean Accuracy
WResNet-10-2 90.16%
Constr. WResNet-10-2 (Ours) 90.4%
more challenging benchmark for adversarial defense
techniques. Unlike the MNIST experiment, defend-
ing against adversarial perturbations through adver-
sarial training in this dataset is relatively more diffi-
cult; thus, we aim to demonstrate that our approach is
also effective in this scenario.
When testing on clean data, as shown in Table 3,
our method exhibits similar performance to the base-
line in terms of clean accuracy. This highlights that
enforcing orthogonality through our proposed con-
straints does not degrade the model’s ability to clas-
sify clean examples. Despite the increased difficulty
of Fashion-MNIST due to more complex class dis-
tributions, our method maintains its performance on
clean data while significantly enhancing robustness
against adversarial attacks. The orthogonality con-
straint does not overfit the model to adversarial pertur-
bations from the training attacks but rather provides
a generalized defense across various attack scenarios,
further confirming its effectiveness in improving over-
all adversarial robustness.
Table 4: Robust accuracies under white-box attacks for the
two compared classifiers on some typical adversarial attacks
on Fashion-MNIST.
Attack WResNet Constr. WResNet (Ours)
FGSM 82.51 % 84.25 %
PGD 81.20 % 83.62 %
C&W 85.30 % 86.61 %
MIM 81.36 % 83.68 %
APGD 82.80 % 88.12 %
APGDT 82.50 % 88.10 %
FAB 82.64 % 90.24 %
Square 87.06 % 90.38 %
SPSA 88.97 % 90.23 %
Jitter 82.58 % 86.09 %
VMIFGSM 82.75 % 85.37 %
VNIFGSM 82.88 % 85.25 %
5.2.1 Robustness Against Adversarial Attacks
We also evaluate the performance of our method
against the same attacks as in section 5.1.1 in a white-
box setting. The results, summarized in Table 4, show
that our method yields a notable improvement in de-
fense performance, achieving an approximate 3.5%
increase in accuracy on average compared to the base-
line, across all attack types, reaching up to 7.5%. This
improvement is consistent with the results on MNIST,
underscoring that our approach is not tailored to a spe-
cific dataset but generalizes well across different data
distributions and adversarial settings.
As with the MNIST experiments, the key advan-
tage of our method is that it does not require any ar-
chitectural modifications or additional computational
overhead during inference. The orthogonality con-
straint, imposed during training, provides robust ad-
versarial defense without introducing significant com-
plexity. Moreover, it helps the model maintain its
resistance to various adversarial attacks, demonstrat-
ing a consistent performance boost without sacrificing
clean data accuracy.
6 CONCLUSION
In this paper, we extend our previous work in (Bifis
et al., 2023), by introducing a novel defense tech-
nique that can be applied to convolutional layers.
We demonstrate its effectiveness compared to tradi-
tional adversarial training. Our experiments on the
MNIST and Fashion-MNIST datasets show consis-
tent improvements of approximately 2% to 7.5% in
adversarial robustness across various attacks, com-
pared to classical adversarial training, without sacri-
ficing accuracy on pristine data. These results sug-
gest that incorporating the defense strategy directly
into the convolutional layers significantly enhances
robustness, providing an efficient and effective im-
provement over adversarial training in specific set-
tings. It is important to note that, while our tech-
nique was tested on an extended network compared to
the backbone WideResNet from (Zagoruyko, 2016),
the observed improvements are due to the addition
of our constraints to the loss function, while main-
taining identical network architectures and training
conditions. Furthermore, this technique can be seam-
lessly integrated into existing architectures and com-
bined with state-of-the-art systems to further improve
adversarial robustness. In future work, we will ex-
plore its application to additional networks and eval-
uate its performance against a broader range of ad-
versarial attacks. Moreover, optimizing the compu-
tational complexity of our method could increase its
practicality for deployment in resource-constrained
environments.
In conclusion, our proposed defense technique
presents a promising avenue for improving the ro-
bustness of convolutional neural networks against ad-
versarial attacks. With further refinement, we believe
it could become an integral component of future de-
fense strategies in deep learning.
OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections
895
REFERENCES
Amini, S., Teymoorianfard, M., Ma, S., and Houmansadr,
A. (2024). Meansparse: Post-training robustness en-
hancement through mean-centered feature sparsifica-
tion. arXiv preprint arXiv:2406.05927.
Andriushchenko, M., Croce, F., Flammarion, N., and Hein,
M. (2019). Square attack: a query-efficient black-
box adversarial attack via random search. CoRR,
abs/1912.00049.
Bartoldson, B. R., Diffenderfer, J., Parasyris, K., and
Kailkhura, B. (2024). Adversarial robustness limits
via scaling-law and human-alignment studies. arXiv
preprint arXiv:2404.09349.
Bifis, A., Psarakis, E. Z., and Kosmopoulos, D. (2023). De-
veloping robust and lightweight adversarial defenders
by enforcing orthogonality on attack-agnostic denois-
ing autoencoders. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages
1272–1281.
Carlini, N. and Wagner, D. (2017). Towards evaluating the
robustness of neural networks. In 2017 IEEE sympo-
sium on security and privacy (sp), pages 39–57.
Chen, Y.-Y., Chen, C.-T., Sang, C.-Y., Yang, Y.-C., and
Huang, S.-H. (2021). Adversarial attacks against rein-
forcement learning-based portfolio management strat-
egy. IEEE Access, 9:50667–50685.
Croce, F., Andriushchenko, M., Sehwag, V., Debenedetti,
E., Flammarion, N., Chiang, M., Mittal, P., and Hein,
M. (2021). Robustbench: a standardized adversarial
robustness benchmark. In Thirty-fifth Conference on
Neural Information Processing Systems Datasets and
Benchmarks Track.
Croce, F. and Hein, M. (2019). Minimally distorted adver-
sarial examples with a fast adaptive boundary attack.
CoRR, abs/1907.02044.
Croce, F. and Hein, M. (2020). Reliable evaluation of
adversarial robustness with an ensemble of diverse
parameter-free attacks. CoRR, abs/2003.01690.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vi-
sion and pattern recognition, pages 248–255.
Deng, L. (2012). The mnist database of handwritten digit
images for machine learning research [best of the
web]. IEEE signal processing magazine, 29(6):141–
142.
Dong, Y., Liao, F., Pang, T., Hu, X., and Zhu, J. (2017).
Discovering adversarial examples with momentum.
CoRR, abs/1710.06081.
Gao, L., Zhang, Q., Song, J., and Shen, H. T. (2020). Patch-
wise++ perturbation for adversarial targeted attacks.
CoRR, abs/2012.15503.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Ex-
plaining and harnessing adversarial examples. arXiv
preprint arXiv:1412.6572.
Kim, H. (2020). Torchattacks: A pytorch repository for
adversarial attacks. arXiv preprint arXiv:2010.01950.
Krizhevsky, A. and Hinton, G. (2009). Learning multiple
layers of features from tiny images.
Kumari, N., Singh, M., Sinha, A., Machiraju, H., Krishna-
murthy, B., and Balasubramanian, V. N. (2019). Har-
nessing the vulnerability of latent layers in adversar-
ially trained models. In Proceedings of the 28th In-
ternational Joint Conference on Artificial Intelligence,
pages 2779–2785.
Levi, M. and Kontorovich, A. (2024). Splitting the differ-
ence on adversarial training. In 33rd USENIX Security
Symposium (USENIX Security 24), pages 3639–3656.
M ˛adry, A., Makelov, A., Schmidt, L., Tsipras, D., and
Vladu, A. (2017). Towards deep learning models re-
sistant to adversarial attacks. stat, 1050(9).
Pang, T., Lin, M., Yang, X., Zhu, J., and Yan, S. (2022).
Robustness and accuracy could be reconcilable by
(proper) definition. In International Conference on
Machine Learning, pages 17258–17277.
Peng, S., Xu, W., Cornelius, C., Hull, M., Li, K., Dug-
gal, R., Phute, M., Martin, J., and Chau, D. H.
(2023). Robust principles: Architectural design prin-
ciples for adversarially robust cnns. arXiv preprint
arXiv:2308.16258.
Rade, R. and Moosavi-Dezfooli, S.-M. (2022). Reducing
excessive margin to achieve a better accuracy vs. ro-
bustness trade-off. In International Conference on
Learning Representations.
Schwinn, L., Raab, R., Nguyen, A., Zanca, D., and Eskofier,
B. M. (2021). Exploring misclassifications of robust
neural networks to enhance adversarial attacks. CoRR,
abs/2105.10304.
Selvakkumar, A., Pal, S., and Jadidi, Z. (2022). Addressing
adversarial machine learning attacks in smart health-
care perspectives. In Sensing Technology: Proceed-
ings of ICST 2022, pages 269–282. Springer.
Sitawarin, C., Chakraborty, S., and Wagner, D. (2021).
Sat: Improving adversarial training via curriculum-
based loss smoothing. In Proceedings of the 14th
ACM Workshop on Artificial Intelligence and Security,
pages 25–36.
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and
Madry, A. (2018). Robustness may be at odds with
accuracy. arXiv preprint arXiv:1805.12152.
Wang, X. and He, K. (2021). Enhancing the transferability
of adversarial attacks through variance tuning. CoRR,
abs/2103.15571.
Wu, H., Yunas, S., Rowlands, S., Ruan, W., and Wahlström,
J. (2023). Adversarial driving: Attacking end-to-end
autonomous driving. In 2023 IEEE Intelligent Vehi-
cles Symposium (IV), pages 1–7.
Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-
mnist: a novel image dataset for benchmarking ma-
chine learning algorithms. CoRR, abs/1708.07747.
Zagoruyko, S. (2016). Wide residual networks. arXiv
preprint arXiv:1605.07146.
Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., and Jor-
dan, M. (2019). Theoretically principled trade-off be-
tween robustness and accuracy. In International con-
ference on machine learning, pages 7472–7482.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
896