Momentum Iterative Gradient Sign Method Outperforms PGD Attacks

Sreenivasan Mohandas

, Naresh Manwani

and Durga Prasad Dhulipudi

Machine Learning Lab, International Institute of Information Technology, Hyderabad, India

Lab for Spatial Informatics, International Institute of Information Technology, Hyderabad, India

Keywords:

Adversarial Attack, Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD) Method, Deep

Learning, Training Time.

Abstract:

Adversarial examples are machine learning model inputs that an attacker has purposefully constructed to cause

the model to make a mistake. A recent line of work focused on making adversarial training computationally

efﬁcient for deep learning models. Projected Gradient Descent (PGD) and Fast Gradient Sign Method (FGSM)

are popular current techniques for generating adversarial examples efﬁciently. There is a tradeoff between

these two in terms of robustness or training time. Among the adversarial defense techniques, adversarial

training with the PGD is considered one of the most effective ways to achieve moderate adversarial robustness.

However, PGD requires too much training time since it takes multiple iterations to generate perturbations. On

the other hand, adversarial training with the FGSM takes much less training time since it takes one step to

generate perturbations but fails to increase adversarial robustness. Our algorithm achieves better robustness to

PGD adversarial training on CIFAR-10/100 datasets and is faster than PGD string adversarial training methods.

1 INTRODUCTION

Adversarial examples are machine learning model in-

puts that an attacker has purposefully constructed to

cause the model to make a mistake. A good adver-

sarial example from MIT is the one that produced

the 3D-printed turtle that, when viewed from almost

any angle, modern ImageNet trained image classiﬁers

misclassify as a riﬂe (Athalye et al., 2018) . Other re-

search has found that making imperceptible changes

to an image can fool a medical imaging system into

correctly classifying a benign mole as malignant with

100 percent certainty (Finlayson et al., 2019) and that

a few pieces of tape can fool a computer vision sys-

tem into incorrectly classifying a stop sign as a speed

limit sign (Eykholt et al., 2018). One of the few de-

fenses against adversarial attacks that can resist pow-

erful attacks is ’adversarial training’, a network is ex-

plicitly trained on hostile samples. Unfortunately, tra-

ditional adversarial training on a large-scale dataset

like ImageNet is impracticable due to the enormous

expense of providing solid adversarial samples. A re-

cent line of work focused on making adversarial train-

ing computationally efﬁcient for deep learning mod-

els. Projected Gradient Descent (PGD) and Fast Gra-

dient Sign Method (FGSM) are popular current tech-

niques for generating adversarial examples efﬁciently.

There is a tradeoff between these two in terms of ro-

bustness or training time.

Among these defense techniques, adversarial

training with the PGD is considered one of the most

effective ways to achieve moderate adversarial ro-

bustness. However, PGD requires too much train-

ing time since it takes multiple iterations to generate

perturbations. On the other hand, adversarial train-

ing with the FGSM takes much less training time

since it takes one step to generate perturbations but

fails to increase adversarial robustness. Eric Wong

(Wong et al., 2020) showed that adversarial training

with the FGSM, when combined with random initial-

ization, is as effective as PGD-based training with the

lower computation time cost. This paper proposes our

method, ”Momentum Iterative gradient sign methods

with Reuse of Perturbations (MIRP).” These perturba-

tions introduced between epochs performs better than

PGD, which was previously believed to be ineffective,

rendering the method no more costly than standard

training in practice. Our algorithm achieves better ro-

bustness to PGD adversarial training on CIFAR-10,

CIFAR-100 datasets and is faster than PGD string ad-

versarial training methods.

Deep neural networks (DNNs) have been recently

achieving state-of-the-art performance on image clas-

siﬁcation datasets such as ImageNet. For example,

Mohandas, S., Manwani, N. and Dhulipudi, D.

Momentum Iterative Gradient Sign Method Outperforms PGD Attacks.

DOI: 10.5220/0010938400003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 3, pages 913-916

ISBN: 978-989-758-547-0; ISSN: 2184-433X

913

He et al.(He et al., 2016) have achieved 96.43 % top-

5 test accuracy with their ResNet architecture, which

is difﬁcult to exceed for humans. (Szegedy et al.,

2014) discovered it is possible for a network to mis-

classify an image by adding a very small perturba-

tion to it which humans will not notice the difference.

These perturbed inputs are termed as adversarial ex-

amples. The existence of adversarial examples poses

a threat to the security of systems incorporating ma-

chine learning models. For example, autonomous cars

could be forced to crash, intruders could confound

face recognition software or SPAM ﬁlters and other

ﬁlters can be bypassed. Also, adversarial examples

can often be generated not only with access to the

model’s architecture and parameters, but also without

it. Therefore, it is important to understand the weak-

ness of our models and come up with ways to defend

them against potential adversaries. One popular ap-

proximation method successfully used in adversarial

training is the PGD attack as it remains empirically

robust to this day.

2 BACKGROUND

In this section, we provide the background knowledge

as well as review the related works about adversarial

attack and defense methods. Given a classiﬁer f(x): x

ε X −→ y ε Y that outputs a label y as the prediction

for an input x, the goal of adversarial attacks is to seek

an example x

in the vicinity of x but is misclassiﬁed

by the classiﬁer. Speciﬁcally, there are two classes

of adversarial examples - non-targeted and targeted

ones. For a correctly satisﬁed input x with ground-

truth label y such that f(x) = y, a non-targeted adver-

sarial example x

is crafted by adding small noise to x

without changing the label, but misleads the classiﬁer

as f(x

)!=y; and a targeted adversarial example aims

to fool the classiﬁer by outputting a speciﬁc label as

f(x

) 6= y

, where y

is the target label speciﬁed by

the adversary, and y

6= y. In most cases, the Lp norm

of the adversarial noise is required to be less than an

allowed value as kx ∗ −xk

≤ ε, where p could be 0,

1, 2...Below text provides details of FGSM and PGD

adversarial attacks.

Fast Gradient Sign Method (FGSM): FGSM per-

turbs clean examples x for one step by the amount of

ε along the input gradient direction (Goodfellow et al.,

2015):

adv

= x + ε . sign(∇

l(h(x),y)).

Project Gradient Descent (PGD): PGD (Goodfel-

low et al., 2015) perturbs a clean example x for a num-

ber of T steps with smaller step size. After each step

of perturbation, PGD projects the adversarial example

back onto the ε - ball of x, if it goes beyond:

= (x

i-1

+ α . sign(∇

l(h(x

i-1

),y))

where α is the step size, and x

is the adversar-

ial example at the i-th step (x

= x). The step size

is usually set to ε/T ≤ α < ε for overall T steps of

perturbations.

Neural networks are trained with ﬁrst-order gra-

dient descent algorithms. There exist two families

of gradient descent algorithms: accelerated stochas-

tic gradient descent (SGD) and adaptive learning rate.

SGD based algorithms include Momentum and Nes-

terov based methods. Adaptive learning rate include

Adam, Adadelta, Adabelief. DNN’s trained with

SGD based algorithms has a strong generalization

ability with low convergence rate which is vice-versa

in adaptive family based algorithms. In general, solu-

tion based on these optimization families are consid-

ered for improving the transferability of adversarial

examples.

Among many attempts (Metzen et al., 2017; Dong

et al., 2017; Pang et al., 2017; Kurakin et al., 2017; Li

and Gal, 2017; Tram

er et al., 2018; Papernot et al.,

2016) , adversarial training is the most extensively in-

vestigated way to increase the robustness of DNNs

(Kurakin et al., 2017; Tram

er et al., 2018; Goodfellow

et al., 2015). However, running a strong PGD adver-

sary within an inner loop of training is expensive, and

some earlier work in this topic found that taking large

but fewer steps did not always signiﬁcantly change

the resulting robustness of a network (Wang, 2019).

Thus, an inherent tradeoff appears between computa-

tionally efﬁcient approaches which aim at solving the

optimum perturbation value in few iterations as pos-

sible and approaches which aim at solving the same

problem more accurately but with more iterations. To

be more speciﬁc, the PGD attack uses multiple steps

of projected gradient descent (PGD), which is accu-

rate but computationally expensive where as Fast Sign

Gradient Method (FGSM) uses only one iteration of

gradient descent which is computationally efﬁcient.

To combat the increased computational overhead

of the PGD defense, some recent work has looked at

’Fast and Free Adversarial training’ where modiﬁed

FGSM adversarial training achieves a accuracy clo-

sure to the model trained with PGD attacks. These

include improvements include top performing train-

ing methods from DAWNBench competition (Cole-

man et al., 2017) are able to train CIFAR-10 and

CIFAR-100 architectures to standard benchmark met-

rics in mere minutes, using only a modest amount of

computational resources. Although some of the tech-

niques can be quite problem speciﬁc for achieving

bleeding edge performance, more general techniques

such as cyclic learning rates (Smith and Topin, 2019)

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

914

and half-precision computations (Micikevicius et al.,

2018) have been quite successful in the top ranking

submissions, and can also be useful for adversarial

training.

3 MIRP

With the Fast Adversarial training (Wong et al.,

2020), updated FGSM training combined with ran-

dom initialization is as effective as defense with PGD-

based training. The key difference here is to use

perturbation from the previous iteration as the initial

starting point for the next iteration i.e., starting with

the previous minibatch’s perturbation or initializing

from a uniformly random perturbation allow FGSM

adversarial training to succeed at being robust to full-

strength PGD adversarial attacks. FGSM with ran-

dom initialization algorithm is shown below:

Algorithm 1: FGSM with random initialization adversarial

training for R epochs, given some radius ε, step size α, and

a dataset of size M for a network f

for t = 1...R do

for i = 1...M do

δ = Uniform (-δ,δ)

δ = δ + (α ∗ sign(∇

l( f

+ δ);y

))

δ = max(min(δ; ε);−ε)

θ = θ − (∇

l( f

+ δ);y

))

end for

Also, with Boosting algorithm (Dong et al., 2018)

(which doesn’t use sign(.) method) achieves better

performance than FGSM but with PGD attacks its ac-

curacy drops to 0. On the above basis, we propose

a new algorithm to use Momentum iterative gradi-

ent sign adversarial training with random initializa-

tion for the perturbation as shown below.

Algorithm 2: MIRP adversarial training for R iterations, T

epochs, given some radius ε, N PGD steps, step size α, and

a dataset of size M for a network f

for t = 1...T do

for i = 1...M do

for j = 1...R do

j+1

= µ ∗ m

(∇

l( f

+δ);y

))

k(∇

l( f

+δ);y

))k

δ = δ + (α ∗ sign(m

j+1

))

δ = max(min(δ; ε);−ε)

end for

θ = θ − (∇

l( f

+ δ);y

))

end for

3.1 Tables

Below tables shows the test accuracy and the train-

ing time of the different models tested on different

datasets.

Table 1: Standard and robust performances of various ad-

versarial training methods on CIFAR-10 dataset

Method PGD / Time

MIRP 50.59 / 47 min

Std Momentum Iterative 0 / 47.12 min

FGSM with random

initialization 47.53 / 24.6 min

PGD 49.75 / 89.6 min

Table 2: Standard and robust performances of various ad-

versarial training methods on CIFAR-100 dataset.

Method PGD / Time

MIRP 28.22 / 44.55 min

Std Momentum Iterative 0 / 44.16 min

FGSM with random

initialization 24.88 / 23.19 min

PGD 27.36 / 92.16 min

Here, Standard Momentum Iterative method is

based on (Dong et al., 2018), FGSM with random

initialization is based on (Wong et al., 2020). We

used same conﬁguration settings like step size, cyclic

learning rate and mixed precision arithmetic as per

Eric Wong (Wong et al., 2020). All experiments using

MIRP are carried out with random starting points and

step size alpha = 1.25*epsilon. All PGD adversaries

used at evaluation are run with 10 random restarts for

50 iterations with ε = 8/255. Speedup with mixed-

precision was incorporated with the Apex amp pack-

age at the O2 optimization level without loss scaling

for CIFAR-10/100 experiments. All experiments are

tested on Nvidia T4 machine.

Reuse between Epochs: We tested multiple combi-

nations between momentum iterative gradient values

and perturbation and below are the inferences drawn:

Table 3: Performance of MIRP on CIFAR-10 dataset with

varying m and δ.

m , δ PGD / Time

Non-zero, Non-zero 50.59 / 47 min

Non-zero, Zero 0 / 47.5 min

Zero , Non-zero 50.39 / 45.8 min

Zero , Zero 0 / 47.12 min

Decaying Factor (µ): Tested various values of De-

caying factor from 0.1 to 0.6 and could see that model

starts overﬁtting for any value beyond 0.1 as shown

below:

Momentum Iterative Gradient Sign Method Outperforms PGD Attacks

915

Table 4: Performance of MIRP on CIFAR-10 dataset with

varying decaying factor.

Decaying factor PGD / Time

0.1 50.59 / 47 min

0.6 49.19 / 50.8 min

Here, Non-zero refers that the values are retained

between batches and Zero refers to that the values are

initialized to zero between batches.

4 CONCLUSIONS

Our ﬁndings show that MIRP adversarial training,

when used with random initialization, can in fact be

more effective as the more costly PGD adversarial

training. As a result, we are able to learn adversarially

robust classiﬁers for CIFAR10/100 in minutes. We

believe that leveraging these signiﬁcant reductions in

time to train robust models will allow future work to

iterate even faster, and accelerate research in learning

models which are resistant to adversarial attacks.

REFERENCES

Athalye, A., Engstrom, L., Ilyas, A., and Kwok, K. (2018).

Synthesizing robust adversarial examples.

Coleman, C. A., Narayanan, D., Kang, D., Zhao, T., Zhang,

J., Nardi, L., Bailis, P., Olukotun, K., R

e, C., and Za-

haria, M. A. (2017). Dawnbench : An end-to-end deep

learning benchmark and competition.

Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., and Li,

J. (2018). Boosting adversarial attacks with momen-

tum. 2018 IEEE/CVF Conference on Computer Vision

and Pattern Recognition, pages 9185–9193.

Dong, Y., Su, H., Zhu, J., and Bao, F. (2017). Towards

interpretable deep neural networks by leveraging ad-

versarial examples. ArXiv, abs/1708.05493.

Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A.,

Xiao, C., Prakash, A., Kohno, T., and Song, D. (2018).

Robust physical-world attacks on deep learning mod-

els.

Finlayson, S. G., Bowers, J., Ito, J., Zittrain, J., Beam, A.,

and Kohane, I. S. (2019). Adversarial attacks on med-

ical machine learning. Science, 363:1287 – 1289.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-

plaining and harnessing adversarial examples. CoRR,

abs/1412.6572.

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-

ual learning for image recognition. 2016 IEEE Con-

ference on Computer Vision and Pattern Recognition

(CVPR), pages 770–778.

Kurakin, A., Goodfellow, I. J., and Bengio, S. (2017).

Adversarial machine learning at scale. ArXiv,

abs/1611.01236.

Li, Y. and Gal, Y. (2017). Dropout inference in bayesian

neural networks with alpha-divergences. In ICML.

Metzen, J. H., Genewein, T., Fischer, V., and Bischoff, B.

(2017). On detecting adversarial perturbations. ArXiv,

abs/1702.04267.

Micikevicius, P., Narang, S., Alben, J., Diamos, G. F.,

Elsen, E., Garc

ıa, D., Ginsburg, B., Houston, M.,

Kuchaiev, O., Venkatesh, G., and Wu, H. (2018).

Mixed precision training. ArXiv, abs/1710.03740.

Pang, T., Du, C., and Zhu, J. (2017). Robust deep learn-

ing via reverse cross-entropy training and threshold-

ing test. ArXiv, abs/1706.00633.

Papernot, N., Mcdaniel, P., Wu, X., Jha, S., and Swami, A.

(2016). Distillation as a defense to adversarial per-

turbations against deep neural networks. 2016 IEEE

Symposium on Security and Privacy (SP), pages 582–

597.

Smith, L. N. and Topin, N. (2019). Super-convergence: very

fast training of neural networks using large learning

rates. In Defense + Commercial Sensing.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,

D., Goodfellow, I. J., and Fergus, R. (2014). Intriguing

properties of neural networks. CoRR, abs/1312.6199.

Tram

er, F., Kurakin, A., Papernot, N., Boneh, D., and Mc-

daniel, P. (2018). Ensemble adversarial training: At-

tacks and defenses. ArXiv, abs/1705.07204.

Wang, J. (2019). Bilateral adversarial training: Towards

fast training of more robust models against adversarial

attacks. 2019 IEEE/CVF International Conference on

Computer Vision (ICCV), pages 6628–6637.

Wong, E., Rice, L., and Kolter, J. Z. (2020). Fast is better

than free: Revisiting adversarial training.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

916