classification space which are spaced apart as
opposed to interwoven classification regions. This is
illustrated in Figure 8. For the light-weight network,
once the perturbation has caused the image to reach a
new classification region, it has a large margin of
error before accidently switching labels to another or
even back to the original label. However, the more
complex network requires much more finesse, as a
slight change may cause the image to move into
another classification region. Assuming the initial
epsilon specified was enough to reach a new
classification region, this would explain the high ASR
against the light-weight mobilenet model and the
lower ASR against the more complex inception_v3
model. This would also explain why using different
attacks against the light-weight model often resulted
in the same label; given the attacks are similar (all
gradient-based), even with slight differences there is
a high chance they all end up in the same large
classification region.
Figure 8: 2D illustration of classification space for a light-
weight (left) and complex (right) network. Each colour
represents a different label or classification region.
These insights found in this work may assist future
researchers to develop more robust models against
untargeted gradient-based attacks. Since the analyzed
attacks were all gradient-based, future work would be
to consider other attack types as well. Additional
research into the classification space of the models for
the ImageNet models should also be considered. For
example, comparing the classification space of
simpler and more complex neural network models.
ACKNOWLEDGEMENT
This work was supported in part by Defence
Advanced Research Projects Agency (DARPA)
under the grant UTrap: University Transferrable
Perturbations for Machine Vision Disruption. The
U.S. Government is authorized to reproduce and
distribute reprints for Government purposes
notwithstanding any copyright annotation thereon.
The views and conclusions contained herein are those
of the authors and should not be interpreted as
necessarily representing the official policies or
endorsements, either expressed or implied, of
DARPA, or the U.S. Government.
REFERENCES
Akhtar, N., & Mian, A. (2018, Mar 28). Threat of
Adversarial Attacks on Deep Learning in Computer
Vision: A Survey. Retrieved from
https://arxiv.org/abs/1801.00553
Brendel, W., Rauber, J., Kümmerer, M., Ustyuzhaninov, I.,
& Bethge, M. (2019, Dec 12). Accurate, reliable and
fast robustness evaluation. Retrieved from
https://arxiv.org/abs/1907.01003
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A.,
Xiao, C., . . . Song, D. (2018, Apr 10). Robust Physical-
World Attacks on Deep Learning Models. Retrieved
from https://arxiv.org/abs/1707.08945
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015, Mar 20).
Explaining and Harnessing Adversarial Examples.
Retrieved from https://arxiv.org/abs/1412.6572
Kurakin, A., Goodfellow, I. J., & Bengio, S. (2017, Feb 11).
Adversarial Examples in the Physical World. Retrieved
from https://arxiv.org/abs/1607.02533
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu,
A. (2019, Sep 4). Towards Deep Learning Models
Resistant to Adversarial. Retrieved from
https://arxiv.org/abs/1706.06083
Moosavi-Dezfooli, S.-M., Fawzi, A., & Frossard, P. (2016,
Jul 4). DeepFool: a simple and accurate method to fool
deep neural networks. Retrieved from
https://arxiv.org/abs/1511.04599
Rauber, J., Zimmermann, R., Bethge, M., & Brendel, W.
(2020). Foolbox Native: Fast adversarial attacks to
benchmark the robustness of machine learning models
in PyTorch, Tensorflow and JAX. Retrieved from
https://joss.theoj.org/papers/10.21105/joss.02607
Rony, J., Hafemann, L. G., Oliveira, L. S., Ayed, I. B.,
Sabourin, R., & Granger, E. (2019, Apr 3). Decoupling
Direction and Norm for Efficient Gradient-Based L2
Adversarial Attacks and Defenses. Retrieved from
https://arxiv.org/abs/1811.09600
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., & Fergus, R. (2014, Feb 19).
Intruiging properties of neural networks. Retrieved
from https://arxiv.org/abs/1312.6199
Tramèr, F., & Boneh, D. (2019, Oct 18). Adversarial
Training and Robustness for Multiple Perturbations.
Retrieved from https://arxiv.org/abs/1904.13000.