OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural

Networks via Orthogonal Projections

Aristeidis Biﬁs

and Emmanouil Psarakis

Computer Engineering & Informatics Department, University of Patras, Patras, Greece

ﬁ

Keywords:

Adversarial Defense, Adversarial Training, Neural Network Robustness, Adversarial Robustness, Deep

Learning, Convolutional Layers, Null-Space Projection, Range-Space Projection, Orthogonal Projection,

PGD, White Box, Feature Manipulation.

Abstract:

Adversarial training is the standard method for improving the robustness of neural networks against adversarial

attacks. However, a well-known trade-off exists: while adversarial training increases resilience to perturba-

tions, it often results in a signiﬁcant reduction in accuracy on clean (unperturbed) data. This compromise

leads to models that are more resistant to adversarial attacks but less effective on natural inputs. In this pa-

per, we introduce an extension to adversarial training by applying novel constraints on convolutional layers,

that address this trade-off. Speciﬁcally, we use orthogonal projections to decompose the learned features into

clean signal and adversarial noise, projecting them onto the range and null spaces of the network’s weight

matrices. These constraints improve the separation of adversarial noise from useful signals during training,

enhancing robustness while preserving the same performance on clean data as adversarial training. Our ap-

proach achieves signiﬁcant improvements in robust accuracy while maintaining comparable clean accuracy,

providing a balanced and effective adversarial defense strategy.

1 INTRODUCTION

Adversarial attacks pose a signiﬁcant threat to the re-

liability and security of neural networks, particularly

in critical real-world applications such as autonomous

driving, healthcare, and ﬁnance (Wu et al., 2023; Sel-

vakkumar et al., 2022; Chen et al., 2021). These at-

tacks, which introduce small but deliberate perturba-

tions to input data, can lead to incorrect predictions

or system failures, undermining the trustworthiness

of AI systems. Although adversarial training has be-

come the standard defense mechanism, it often results

in a trade-off: models become more robust to adver-

sarial perturbations but suffer decreased performance

on clean (unperturbed) data (Tsipras et al., 2018;

Zhang et al., 2019). This trade-off limits the prac-

ticality of adversarially robust models, highlighting

the need for methods that enhance robustness without

sacriﬁcing accuracy on natural inputs. Such methods

are crucial for improving the overall reliability and

usability of AI systems in real-world scenarios.

The standard adversarial training framework

(Goodfellow et al., 2014) seeks to improve the robust-

https://orcid.org/0000-0003-0246-1209

https://orcid.org/0000-0002-9627-0640

ness of machine learning models by explicitly training

them on adversarial examples—inputs that have been

intentionally perturbed to mislead the model. The

core idea is to augment the training data with these

adversarial examples, forcing the model to learn from

both the original clean data and the perturbations that

challenge its decision boundaries. This process typi-

cally involves generating adversarial examples using

methods like the Fast Gradient Sign Method (FGSM)

(Goodfellow et al., 2014) or Projected Gradient De-

scent (PGD) (M ˛adry et al., 2017) and incorporating

them into the training procedure. The goal is to en-

hance the model’s ability to classify both clean and

adversarial inputs correctly, thereby increasing its re-

silience to attacks. By exposing the model to a diverse

set of adversarial perturbations, adversarial training

helps it develop more stable decision boundaries, ul-

timately improving its generalization and robustness

against malicious attacks.

A signiﬁcant limitation of existing adversarial

training methods is the trade-off between improv-

ing robustness to adversarial attacks and maintaining

clean accuracy on unperturbed data. While adversar-

ial training helps models become more resilient to ad-

versarial examples by exposing them to perturbations

during training, it often leads to a degradation in clean

Biﬁs, A. and Psarakis, E.

OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections.

DOI: 10.5220/0013389500003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 3: VISAPP, pages

889-896

ISBN: 978-989-758-728-3; ISSN: 2184-4321

889

accuracy—the model’s performance on natural, un-

modiﬁed inputs. This occurs because the model’s de-

cision boundaries are adjusted to become more robust

to adversarial perturbations, sometimes at the cost of

over ﬁtting to the adversarial examples or losing its

ability to generalize well on normal data. This trade-

off represents a key challenge in adversarial training,

as improving robustness to attacks can compromise

the model’s overall performance, making it less effec-

tive in real-world, unperturbed scenarios. Researchers

are actively exploring ways to mitigate this issue, such

as through more sophisticated loss functions, regular-

ization techniques, or hybrid training strategies that

balance both clean and adversarial accuracy.

This paper introduces a novel extension to the

adversarial training (AT) framework by applying or-

thogonal projection constraints to the clean signal and

adversarial noise representations, mapping them onto

the range and null spaces of the network’s weight ma-

trices. This approach separates the adversarial per-

turbations from the clean data signal during training,

effectively enhancing the model’s robustness to ad-

versarial attacks while preserving its performance on

unperturbed data. By doing so, it addresses the com-

mon trade-off between adversarial defense and clean

accuracy, offering a more balanced and practical so-

lution.

2 RELATED WORK

Adversarial training using Projected Gradient De-

scent (PGD) has become one of the most widely

adopted techniques for improving model robustness

against adversarial attacks. In (M ˛adry et al., 2017),

authors demonstrated the effectiveness of PGD-based

adversarial training, showing that iteratively applying

adversarial perturbations during training could sig-

niﬁcantly improve model performance on adversarial

examples. PGD attacks involve multiple steps of gra-

dient updates, which are projected onto a speciﬁed

norm ball to ensure the adversarial perturbations re-

main within a predeﬁned limit. The authors showed

that this method could help train models that are ro-

bust to a variety of adversarial attacks, particularly

white-box attacks like PGD itself, providing a foun-

dation for many subsequent works in adversarial de-

fense (Rade and Moosavi-Dezfooli, 2022; Kumari

et al., 2019; Sitawarin et al., 2021). The approach

has been widely used and adapted to various architec-

tures and datasets, becoming a standard benchmark

for evaluating adversarial robustness. However, while

PGD-based adversarial training is effective, it often

involves a trade-off between robustness and clean data

accuracy, as the process can lead to overﬁtting on the

adversarial perturbations.

TRADES (TRadeoff-inspired Adversarial DE-

fense via Surrogate-loss minimization) (Zhang et al.,

2019), is a prominent adversarial defense technique

designed to improve the robustness of neural net-

works against adversarial attacks while preserving

their generalization to clean data. TRADES intro-

duces a novel approach to adversarial training by min-

imizing a surrogate loss that balances between ad-

versarial robustness and clean data accuracy. Specif-

ically, it incorporates a trade-off term that penal-

izes the difference in output distributions between

clean and adversarial examples, using the Kullback-

Leibler (KL) divergence to measure this discrepancy.

The method encourages the model to behave simi-

larly on both clean and adversarially perturbed data,

which leads to improved performance against adver-

sarial attacks, especially in terms of transferability

and robustness in black-box settings. One of the key

strengths of TRADES is its ability to improve the

trade-off between adversarial robustness and clean ac-

curacy, addressing a common challenge in adversar-

ial training methods, where boosting one often results

in a decline in the other. TRADES has shown su-

perior performance over standard adversarial training

methods, such as PGD-based training, by achieving

better generalization and robustness. However, de-

spite its effectiveness, TRADES introduces additional

computational overhead and requires careful tuning of

the trade-off parameter to maintain a balance between

adversarial robustness and clean data accuracy. Sub-

sequent research has extended TRADES by explor-

ing alternative regularization techniques and improv-

ing its efﬁciency in large-scale models (Pang et al.,

2022; Levi and Kontorovich, 2024).

In (Biﬁs et al., 2023) a novel adversarial defense

strategy that leverages orthogonal constraints applied

to denoising autoencoders (DAEs) was introduced.

The proposed approach demonstrated that tied-weight

DAEs, which have half the complexity of full-weight

models, offer substantial improvements in adversarial

robustness without compromising on computational

efﬁciency. By enforcing orthogonality during train-

ing, the model becomes more resilient to adversarial

perturbations while maintaining low inference over-

head. Building upon this foundation, we extend that

approch to more complex architectures, speciﬁcally

exploring the application of that theory to convolu-

tional layers. Furthermore, we are investigating the

potential of applying the orthogonal constraints out-

side the denoising framework, broadening their appli-

cability to other areas of adversarial defense.

In this paper limitatations of the approach pre-

sented in (Biﬁs et al., 2023) are addresed; namely:

• its focus on applying constraints exclusively to

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

890

fully connected layers in smaller neural networks.

This approach aimed to reduce the number of pa-

rameters while maintaining a comparable level of

robustness to larger, more complex robust net-

works, but it restricted the scope of the technique

in parameter-aware contexts, in addition

• its reliance on noise from known distributions to

minimize the impact on training time, earning it

the label of attack-agnostic. However, while the

method performed as intended in black- and gray-

box setups, it failed to achieve the same level

of robustness as adversarial training in white-box

scenarios.

3 PROBLEM FORMULATION

The objective of our technique is to train a neural net-

work capable of substantially mitigating adversarial

noise embedded in input signals. This approach im-

proves classiﬁcation accuracy on tampered data while

preserving performance on clean data, achieving ac-

curacy comparable to models trained exclusively on

pristine inputs. The ﬁrst step in extending the above-

mentioned technique (Biﬁs et al., 2023) is to ﬁnd out a

way for applying constraints to layer types beyond the

fully connected ones. This shift is necessary because,

in large networks, fully connected layers are less ef-

fective at capturing localized features and are com-

putationally inefﬁcient. A more promising approach

is to adapt the orthogonality constraints for convolu-

tional layers, which are widely used in state of ther art

networks, computationally efﬁcient, and well-suited

for producing localized features. This process dif-

fers fundamentally from the matrix-vector multipli-

cation performed in a network that is constructed by

fully connected layers. Therefore, how can we estab-

lish and extend the theoretical framework presented

in (Biﬁs et al., 2023) to convolutional layers?

In convolutional layers, their inputs and building

blocks are represented by tensors. A convolutional

layer, depending on the kind of its input can be seen

as:

• a feature map generator, if the input is an image

• a feature map transform, if the input is a feature

map itself.

An input, independently of its kind, typically can

be considered as a C

channels image of size H

each. For example, a typical RGB image has

= 3 and a gray-scale C

= 1 and they can be stored

in an input tensor of appropriate size. The convo-

lutional layer kernels (ﬁlters) can also be stored in

tensors. The number of ﬁlters deﬁnes the number

of output channels C

out

and each one can be denoted

by an C

× H

×W

tensor K

out

with H

≤ H

and

≤ W

. Thus, for a given input image or feature

map X , ∈ R

×H

×W

from its convolution with the

kernel K

out

results the c

out

-th output image or feature

map Y

out

of size H

out

×W

out

whose each element is a

RV given by the following relation:

out

(n) =

∑

m∈S

kern

(n − m)K

out

,m)

n ∈ S

out

, c

out

= 1,·· · ,C

out

(1)

with:

kern



[0, H

− 1] × [0, W

− 1]



out



[0, Hout − 1] × [0, Wout − 1]



the supports of the c

out

-th kernel, and the output re-

spectively, that can be equivalently written as follow:

out

= k

out

, c

out

= 1,·· · ,C

out

(2)

where y

out

, k

out

the ﬂatten versions of Y

out

and K

out

of length H

out

and C

respectively and X

an appropriate rearrangement of the input whose the

size depends on the setting of the convolutional pa-

rameters, i.e., stride, dilation etc.. Using Eq. (2) the

linear convolution can be expressed as the product of

a deterministic matrix K of size C

out

×C

with

a random matrix X

of size C

× H

out

follow:

Y = KX

(3)

with the random matrix Y of size C

out

×H

out

and

matrix K deﬁned as follow:

Y =







out







and K =







out







. (4)

Having deﬁned the linear convolution as a multiplica-

tion of matrices, let us make some comments about

the speciﬁc form of the random matrix Y deﬁned in

Eq. (4), how it depends on the form of matrix K and

how we can apply the desired constraints on the range

and the null space of the ﬁlters coefﬁcient matrix K.

3.1 The Proposed Solution

For the purposes of this paper, we model adversar-

ial attacks as the addition of a correlated perturbation

(Goodfellow et al., 2014) to the input X , that is:

= X + W

(5)

In our pipeline, as we can see from Fig. 1, we ﬁrst

apply a non-linear transformation to the inputs RVs.

OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections

891

Figure 1: Proposed pipeline, consisting of three key components: (1) the non-linear transformation, (2) the N number of

constrained layers, each followed by the output normalization step, where novel defense constraints are applied, and (3) the

WideResNet classiﬁer, which performs the ﬁnal classiﬁcation task.

The motivation behind this initial non-linear transfor-

mation is to enhance the data representation, improve

the feature set, and address the complexity of adver-

sarial noise, which follows intricate patterns.

To perform this nonlinear transformation, we use

a simple convolutional layer followed by a non-linear

activation function f (.). Consequently, using Eq. (3)

the pristine data input X and the adversarial input X

are non-linearly transformed to representations Z and

respectively, as follow:

Z = f

(KX

) (6)

= f

(KX

) (7)

We can still claim that the non-linear representa-

tion relationship for the above mentioned transformed

RVs is:

= Z + R (8)

where the term R can be viewed as a residual noise

perturbation that affects the attacked classiﬁer, shift-

ing the representation of the pristine data towards re-

gions that lead to misclassiﬁcation and incorrect re-

sults. We must stress at this point that the noise pertur-

bation R is correlated with the representation of the

pristine data Z. In order to quantify this dependency,

let us rewrite Eq. (8) in a column-wise manner, i.e.:

= z

+ r

, l = 1,··· , H

out

(9)

Then, we have the following proposition.

Proposition 1: Let z

, z

be the non-linear represen-

tations of the pristine and adversarial attacked RVs

respectively. Then, the following relation holds:

= (1 + µ

+ v

, l = 1,··· , H

out

(10)

with constant µ

bounded by unity and deﬁned by:

< z

− z

, z

||z

− z

with < .,. > denoting the inner product operator, and

the vectorized RV v

being orthogonal to z

, that is,

< z

, v

>= 0.

Proof: The proof is easy and thus omitted. □

We are going to exploit Proposition 1 in order to

properly deﬁne the activation functions of the next

convolutional layers shown in Fig. 1. The output of

these layers can be deﬁned as follow:

= f

k−1

), k = 1,2,· ·· ,N (11)

with RV Q

being either Z

and Q

= Z, if the pris-

tine data feed the input of the DNN, or Z

with

= Z

if the adversarial one, and f

(.), K

the ac-

tivation function, that acts in a column-wise manner,

and kernel’s matrix of the k-th convolutional layer re-

spectively.

In order to achieve our goal, let us adopt the

framework proposed in (Biﬁs et al., 2023); namely

the goal is to produce pristine representations by con-

straining the weights K

, k = 1,2,· ·· ,N of kernels in

each convolutional layer of our network to:

• project adversarial residual noise perturbation

representations onto the null space of the net’s

weights

• while preserving all the information from the

pristine data representations in the range of the

weights.

To this end we focus on ﬁnding weights K

, k =

1,2,· · · , N for each convolutional layer of the pipeline

shown in Fig. 1 such that the corresponding residual

noise v

k,l

and z

k,l

be orthogonal to speciﬁc parts of

the weights matrix. To satisfy these conditions, we

can utilize the null space and the range of the matrix

. Next lemma gives us the solution to all the above

mentioned requirements.

Lemma 1: Let us consider that the following orthog-

onality constraints:

k,l

= 0 (12)

k,l

= 0 (13)

are imposed on the k-th convolutional layer, k =

1,2,· · · , N, of the pipeline shown in Fig. 1 during the

training phase of the network, with U

, U

denoting

the range and null space of the K

kernel’s weights re-

spectively, that can be obtained from the SVD of the

corresponding matrix K

= V

. Let us also con-

sider that the following activation function:

k,l

) =

k,l

||K

k,l

, l = 1,··· , H

out

(14)

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

892

is acting on the output of the corresponding convo-

lutional layer, with q

k,l

denoting the l-th column of

the random matrix Q

deﬁned in Eq. (11). Then, the

pristine and adversarial representations match.

Proof: The proof of Lemma 1 is easy. Note that if

we denote by o

k,l

= K

k−1,l

the output of the k−th

convolutional layer to the l−th column of the pristine

matrix Z

k−1

then, using Proposition 1 we easily ob-

tain the following relation:

k,l

= (1 + µ

k,l

. (15)

By applying the activation function deﬁned in Eq.

(14), we can easily prove the lemma. □

Concluding, by following our pipeline and apply-

ing the proposed constraints during training, we ide-

ally would like to acquire a model where, for each

pair of pristine and adversarial data their representa-

tions to match.

3.2 Loss Function

Since we want the whole network be a classiﬁer, we

utilize adversarial training and as the loss function we

propose the use of a cross entropy based one. Specif-

ically:

L(W ) = E

X ,L

[CE(g(X ;W ), L) +CE(g(X

;W ),L)]

(16)

where g(.;W ) the output of the whole net, W =



K, {U

, U

}

k=1

, W



the weights of the non-

linear transformation, the N convolutional layers, the

weights of the WideResnet based Classiﬁer and L the

ground truth labels’s set. In addition we would like to

impose the following constraints:

[||U

] = 0, k = 1,· · · ,N (17)

[||U

] = 0, k = 1,· · · ,N. (18)

with ||X||

denoting the Euclidean l

norm or Frobe-

nious norm of matrix X. Then, we deﬁne the follow-

ing Lagrangian function:

J(W

) = L(W ) +

∑

k=1

[||U

]

∑

k=1

[||U

]

or equivalently:

J(W

) = L(W ) +

∑

k=1





∑

k=1





(19)

where W

= {W,{λ

,λ

}

k=1

} and tr{A} denoting

the trace of matrix A, and minimize it over the weights

and the Lagrange multipliers λ

, λ

, k = 1,·· · , N

of the network, and this concludes the section.

4 EXPERIMENTAL SETUP

All our experiments were conducted, using an

NVIDIA A100 with 40GB of vram. As a backbone

neural network we utilized the WideResNet architec-

ture from (Zagoruyko, 2016). This architecture has

been widely adopted by other researchers for adver-

sarial defense in classiﬁcation tasks (Bartoldson et al.,

2024; Amini et al., 2024; Peng et al., 2023), has

been proven effective and is frequently used in the

literature, as demonstrated by RobustBench (Croce

et al., 2021), a famous benchmark in the context

of adversarial robustness for adversarial defenses in

the CIFAR-10, CIFAR-100 (Krizhevsky and Hinton,

2009) and ImageNet (Deng et al., 2009) datasets. For

our experiments we used a small version of WideRes-

Net with 10 layers and a widen factor of 2 (namely

WideResNet-10-2). We ran our tests for two datasets,

MNIST (Deng, 2012) & Fashion-MNIST (Xiao et al.,

2017). As learning rate, we used 10

−5

for weights

and 1 and 0.01 for lamdas on each dataset respec-

tively. We also tested our theory with different at-

tack hyperparameters. In each network we also added

a non-linear transformation layer in the beginning,

which consisted of an appropriate size convolutional

layer followed by a ReLU. We then added two layers

(N = 2) on which we enforced our constraints during

training.

To perform adversarial training, we used PGD

(M ˛adry et al., 2017). For MNIST, we used 40 PGD

steps with e = 75/255, and a step size of 2/255.

For Fashion-MNIST, we used 10 PGD steps with

e = 8/255, and a step size of 2/255. We evaluated our

trained models under various classical as well as more

recent adversarial attacks, namely FGSM (Goodfel-

low et al., 2014), PGD (M ˛adry et al., 2017), C&W

(Carlini and Wagner, 2017), MIM (Dong et al., 2017),

APGD (Croce and Hein, 2020), APGDT (Croce and

Hein, 2020), FAB (Croce and Hein, 2019), Square

(Andriushchenko et al., 2019), SPSA (Gao et al.,

2020), Jitter (Schwinn et al., 2021), VMIFGSM &

VNIFGSM (Wang and He, 2021) . For the attack

implementations, we utilized the widely-used torchat-

tacks library (Kim, 2020), applying the default pa-

rameters for each attack, as well as using the same

values (where applicable) for e, step size, and number

of steps as in the adversarial training.

5 RESULTS

In this section we compare our constraint results with

the baseline adversarial training (M ˛adry et al., 2017).

Our approach does not necessitate direct comparison

with state-of-the-art defenses, as its primary value

OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections

893

lies in its versatility and lightweight nature. Unlike

many specialized techniques, our method is not con-

ﬁned to speciﬁc threat models or architectures; in-

stead, it seamlessly integrates with any convolutional

layer-based system, enhancing its robustness. This

generality distinguishes our approach, as it comple-

ments rather than competes with existing defenses.

Moreover, adversarial training serves as a universal

baseline in this domain due to its ubiquity and es-

tablished effectiveness across diverse models and at-

tack scenarios. By focusing on comparisons with

adversarial training, we highlight the adaptability of

our method while avoiding the pitfalls of narrow,

scenario-speciﬁc evaluations that may not reﬂect its

true potential. This emphasis underscores our contri-

bution as a foundational enhancement to robust learn-

ing, capable of synergizing with state-of-the-art tech-

niques to achieve even greater resilience.

5.1 MNIST Results

We begin by testing our hypothesis using the MNIST

dataset, which consists of 60.000 training images

and 10.000 test images of handwritten digits, span-

ning 10 classes. While defending against adversar-

ial attacks of small magnitude on this dataset is rela-

tively straightforward, particularly through adversar-

ial training, we aim to demonstrate the robustness of

our approach under more challenging conditions. To

this end, we increase the number of iterations for PGD

and use a higher perturbation magnitude ε (compared

to 10 iterations and perturbation magnitude of 8/255

typically used in other datasets) to generate adversar-

ial examples that impose a stronger challenge, which

is common practice in adversarial defense research.

More details can be found in Section 4.

We ﬁrst compare the performance of our model

on clean, unperturbed data. As shown in Table 1,

the clean accuracies for both nets, the baseline and

the proposed, are comparable. This demonstrates that

the constraint we apply to enforce weight orthogo-

nality does not adversely affect the model’s ability

to correctly classify clean examples. Our method’s

primary contribution lies in improving the robustness

of the model against adversarial examples. The en-

forced orthogonality through our proposed constraint

enhances the model’s defense capability against ad-

versarial perturbations, without negatively impacting

its performance on clean data.

5.1.1 Robustness Against Adversarial Attacks

Next, we evaluate the performance of our trained

models against a variety of adversarial attacks in a

white-box context. The results, presented in Table

Table 1: Classiﬁcation accuracies of the two compared clas-

siﬁers on MNIST.

Classiﬁer Clean Accuracy

WResNet-10-2 99.30%

Constr. WResNet-10-2 (Ours) 99.33%

Table 2: Robust accuracies under white-box attacks for the

two compared classiﬁers on some typical adversarial attacks

on MNIST.

Attack WResNet Constr. WResNet (Ours)

FGSM 97.06 % 98.74 %

PGD 95.81 % 96.96 %

C&W 98.17 % 98.33 %

MIM 95.77 % 97.06 %

APGD 92.03 % 98.26 %

APGDT 91.97 % 98.22 %

FAB 94.42 % 98.95 %

Square 97.32 % 98.45 %

SPSA 99.20 % 99.32 %

Jitter 97.25 % 98.02 %

VMIFGSM 95.88 % 96.97 %

VNIFGSM 95.78 % 96.82 %

2, clearly demonstrate that our method outperforms

the baseline by nearly 2% average across all attack

types, reaching up to 6% at multiple attacks. We must

stress at this point that this improvement is achieved

solely by imposing our orthogonality constraint dur-

ing training—without altering the network’s architec-

ture or introducing additional computational overhead

during inference.

In other words, our proposed constraint enhances

the model’s robustness without causing overﬁtting to

any speciﬁc attack, such as PGD, and generalizes well

to other adversarial attacks. This highlights the versa-

tility and effectiveness of our approach in strength-

ening the model’s resistance to adversarial perturba-

tions, marking a signiﬁcant contribution to the ﬁeld

of adversarial defense.

In summary, our method showcases a simple yet

powerful enhancement to adversarial training, im-

proving robustness across a variety of attacks while

maintaining similar performance on clean data and

avoiding the typical trade-offs associated with more

complex defense mechanisms.

5.2 Fashion-MNIST Results

We extend our evaluation to the Fashion-MNIST

dataset, which consists of 60.000 training images and

10.000 test images representing 10 classes of cloth-

ing items. The complexity of Fashion-MNIST lies

in the similarity between certain classes (e.g., t-shirts

vs. pullovers, trousers vs. dresses), making it a

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

894

Table 3: Classiﬁcation accuracies of the two compared clas-

siﬁers on Fashion-MNIST.

Classiﬁer Clean Accuracy

WResNet-10-2 90.16%

Constr. WResNet-10-2 (Ours) 90.4%

more challenging benchmark for adversarial defense

techniques. Unlike the MNIST experiment, defend-

ing against adversarial perturbations through adver-

sarial training in this dataset is relatively more difﬁ-

cult; thus, we aim to demonstrate that our approach is

also effective in this scenario.

When testing on clean data, as shown in Table 3,

our method exhibits similar performance to the base-

line in terms of clean accuracy. This highlights that

enforcing orthogonality through our proposed con-

straints does not degrade the model’s ability to clas-

sify clean examples. Despite the increased difﬁculty

of Fashion-MNIST due to more complex class dis-

tributions, our method maintains its performance on

clean data while signiﬁcantly enhancing robustness

against adversarial attacks. The orthogonality con-

straint does not overﬁt the model to adversarial pertur-

bations from the training attacks but rather provides

a generalized defense across various attack scenarios,

further conﬁrming its effectiveness in improving over-

all adversarial robustness.

Table 4: Robust accuracies under white-box attacks for the

two compared classiﬁers on some typical adversarial attacks

on Fashion-MNIST.

Attack WResNet Constr. WResNet (Ours)

FGSM 82.51 % 84.25 %

PGD 81.20 % 83.62 %

C&W 85.30 % 86.61 %

MIM 81.36 % 83.68 %

APGD 82.80 % 88.12 %

APGDT 82.50 % 88.10 %

FAB 82.64 % 90.24 %

Square 87.06 % 90.38 %

SPSA 88.97 % 90.23 %

Jitter 82.58 % 86.09 %

VMIFGSM 82.75 % 85.37 %

VNIFGSM 82.88 % 85.25 %

5.2.1 Robustness Against Adversarial Attacks

We also evaluate the performance of our method

against the same attacks as in section 5.1.1 in a white-

box setting. The results, summarized in Table 4, show

that our method yields a notable improvement in de-

fense performance, achieving an approximate 3.5%

increase in accuracy on average compared to the base-

line, across all attack types, reaching up to 7.5%. This

improvement is consistent with the results on MNIST,

underscoring that our approach is not tailored to a spe-

ciﬁc dataset but generalizes well across different data

distributions and adversarial settings.

As with the MNIST experiments, the key advan-

tage of our method is that it does not require any ar-

chitectural modiﬁcations or additional computational

overhead during inference. The orthogonality con-

straint, imposed during training, provides robust ad-

versarial defense without introducing signiﬁcant com-

plexity. Moreover, it helps the model maintain its

resistance to various adversarial attacks, demonstrat-

ing a consistent performance boost without sacriﬁcing

clean data accuracy.

6 CONCLUSION

In this paper, we extend our previous work in (Biﬁs

et al., 2023), by introducing a novel defense tech-

nique that can be applied to convolutional layers.

We demonstrate its effectiveness compared to tradi-

tional adversarial training. Our experiments on the

MNIST and Fashion-MNIST datasets show consis-

tent improvements of approximately 2% to 7.5% in

adversarial robustness across various attacks, com-

pared to classical adversarial training, without sacri-

ﬁcing accuracy on pristine data. These results sug-

gest that incorporating the defense strategy directly

into the convolutional layers signiﬁcantly enhances

robustness, providing an efﬁcient and effective im-

provement over adversarial training in speciﬁc set-

tings. It is important to note that, while our tech-

nique was tested on an extended network compared to

the backbone WideResNet from (Zagoruyko, 2016),

the observed improvements are due to the addition

of our constraints to the loss function, while main-

taining identical network architectures and training

conditions. Furthermore, this technique can be seam-

lessly integrated into existing architectures and com-

bined with state-of-the-art systems to further improve

adversarial robustness. In future work, we will ex-

plore its application to additional networks and eval-

uate its performance against a broader range of ad-

versarial attacks. Moreover, optimizing the compu-

tational complexity of our method could increase its

practicality for deployment in resource-constrained

environments.

In conclusion, our proposed defense technique

presents a promising avenue for improving the ro-

bustness of convolutional neural networks against ad-

versarial attacks. With further reﬁnement, we believe

it could become an integral component of future de-

fense strategies in deep learning.

OrthoCNN: Mitigating Adversarial Noise in Convolutional Neural Networks via Orthogonal Projections

895

REFERENCES

Amini, S., Teymoorianfard, M., Ma, S., and Houmansadr,

A. (2024). Meansparse: Post-training robustness en-

hancement through mean-centered feature sparsiﬁca-

tion. arXiv preprint arXiv:2406.05927.

Andriushchenko, M., Croce, F., Flammarion, N., and Hein,

M. (2019). Square attack: a query-efﬁcient black-

box adversarial attack via random search. CoRR,

abs/1912.00049.

Bartoldson, B. R., Diffenderfer, J., Parasyris, K., and

Kailkhura, B. (2024). Adversarial robustness limits

via scaling-law and human-alignment studies. arXiv

preprint arXiv:2404.09349.

Biﬁs, A., Psarakis, E. Z., and Kosmopoulos, D. (2023). De-

veloping robust and lightweight adversarial defenders

by enforcing orthogonality on attack-agnostic denois-

ing autoencoders. In Proceedings of the IEEE/CVF

International Conference on Computer Vision, pages

1272–1281.

Carlini, N. and Wagner, D. (2017). Towards evaluating the

robustness of neural networks. In 2017 IEEE sympo-

sium on security and privacy (sp), pages 39–57.

Chen, Y.-Y., Chen, C.-T., Sang, C.-Y., Yang, Y.-C., and

Huang, S.-H. (2021). Adversarial attacks against rein-

forcement learning-based portfolio management strat-

egy. IEEE Access, 9:50667–50685.

Croce, F., Andriushchenko, M., Sehwag, V., Debenedetti,

E., Flammarion, N., Chiang, M., Mittal, P., and Hein,

M. (2021). Robustbench: a standardized adversarial

robustness benchmark. In Thirty-ﬁfth Conference on

Neural Information Processing Systems Datasets and

Benchmarks Track.

Croce, F. and Hein, M. (2019). Minimally distorted adver-

sarial examples with a fast adaptive boundary attack.

CoRR, abs/1907.02044.

Croce, F. and Hein, M. (2020). Reliable evaluation of

adversarial robustness with an ensemble of diverse

parameter-free attacks. CoRR, abs/2003.01690.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,

L. (2009). Imagenet: A large-scale hierarchical image

database. In 2009 IEEE conference on computer vi-

sion and pattern recognition, pages 248–255.

Deng, L. (2012). The mnist database of handwritten digit

images for machine learning research [best of the

web]. IEEE signal processing magazine, 29(6):141–

142.

Dong, Y., Liao, F., Pang, T., Hu, X., and Zhu, J. (2017).

Discovering adversarial examples with momentum.

CoRR, abs/1710.06081.

Gao, L., Zhang, Q., Song, J., and Shen, H. T. (2020). Patch-

wise++ perturbation for adversarial targeted attacks.

CoRR, abs/2012.15503.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Ex-

plaining and harnessing adversarial examples. arXiv

preprint arXiv:1412.6572.

Kim, H. (2020). Torchattacks: A pytorch repository for

adversarial attacks. arXiv preprint arXiv:2010.01950.

Krizhevsky, A. and Hinton, G. (2009). Learning multiple

layers of features from tiny images.

Kumari, N., Singh, M., Sinha, A., Machiraju, H., Krishna-

murthy, B., and Balasubramanian, V. N. (2019). Har-

nessing the vulnerability of latent layers in adversar-

ially trained models. In Proceedings of the 28th In-

ternational Joint Conference on Artiﬁcial Intelligence,

pages 2779–2785.

Levi, M. and Kontorovich, A. (2024). Splitting the differ-

ence on adversarial training. In 33rd USENIX Security

Symposium (USENIX Security 24), pages 3639–3656.

M ˛adry, A., Makelov, A., Schmidt, L., Tsipras, D., and

Vladu, A. (2017). Towards deep learning models re-

sistant to adversarial attacks. stat, 1050(9).

Pang, T., Lin, M., Yang, X., Zhu, J., and Yan, S. (2022).

Robustness and accuracy could be reconcilable by

(proper) deﬁnition. In International Conference on

Machine Learning, pages 17258–17277.

Peng, S., Xu, W., Cornelius, C., Hull, M., Li, K., Dug-

gal, R., Phute, M., Martin, J., and Chau, D. H.

(2023). Robust principles: Architectural design prin-

ciples for adversarially robust cnns. arXiv preprint

arXiv:2308.16258.

Rade, R. and Moosavi-Dezfooli, S.-M. (2022). Reducing

excessive margin to achieve a better accuracy vs. ro-

bustness trade-off. In International Conference on

Learning Representations.

Schwinn, L., Raab, R., Nguyen, A., Zanca, D., and Eskoﬁer,

B. M. (2021). Exploring misclassiﬁcations of robust

neural networks to enhance adversarial attacks. CoRR,

abs/2105.10304.

Selvakkumar, A., Pal, S., and Jadidi, Z. (2022). Addressing

adversarial machine learning attacks in smart health-

care perspectives. In Sensing Technology: Proceed-

ings of ICST 2022, pages 269–282. Springer.

Sitawarin, C., Chakraborty, S., and Wagner, D. (2021).

Sat: Improving adversarial training via curriculum-

based loss smoothing. In Proceedings of the 14th

ACM Workshop on Artiﬁcial Intelligence and Security,

pages 25–36.

Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and

Madry, A. (2018). Robustness may be at odds with

accuracy. arXiv preprint arXiv:1805.12152.

Wang, X. and He, K. (2021). Enhancing the transferability

of adversarial attacks through variance tuning. CoRR,

abs/2103.15571.

Wu, H., Yunas, S., Rowlands, S., Ruan, W., and Wahlström,

J. (2023). Adversarial driving: Attacking end-to-end

autonomous driving. In 2023 IEEE Intelligent Vehi-

cles Symposium (IV), pages 1–7.

Xiao, H., Rasul, K., and Vollgraf, R. (2017). Fashion-

mnist: a novel image dataset for benchmarking ma-

chine learning algorithms. CoRR, abs/1708.07747.

Zagoruyko, S. (2016). Wide residual networks. arXiv

preprint arXiv:1605.07146.

Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., and Jor-

dan, M. (2019). Theoretically principled trade-off be-

tween robustness and accuracy. In International con-

ference on machine learning, pages 7472–7482.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

896