Neural Information Processing Systems, volume 34,
pages 26831–26843. Curran Associates, Inc.
Carlini, N., Athalye, A., Papernot, N., Brendel, W., Rauber,
J., Tsipras, D., Goodfellow, I., Madry, A., and Ku-
rakin, A. (2019). On evaluating adversarial robust-
ness. arXiv preprint arXiv:1902.06705.
Dong, Y., Su, H., Zhu, J., and Bao, F. (2017). Towards
interpretable deep neural networks by leveraging ad-
versarial examples. arXiv preprint arXiv:1708.05493.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn,
D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer,
M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby,
N. (2021). An image is worth 16x16 words: Trans-
formers for image recognition at scale. In Interna-
tional Conference on Learning Representations.
Gani, H., Naseer, M., and Yaqub, M. (2022). How to train
vision transformer on small-scale datasets? arXiv
preprint arXiv:2210.07240.
Gehrig, M. and Scaramuzza, D. (2023). Recurrent vision
transformers for object detection with event cameras.
In Proceedings of the IEEE/CVF Conference on Com-
puter Vision and Pattern Recognition, pages 13884–
13893.
Jaegle, A., Gimeno, F., Brock, A., Vinyals, O., Zisserman,
A., and Carreira, J. (2021). Perceiver: General percep-
tion with iterative attention. In International Confer-
ence on Machine Learning, pages 4651–4664. PMLR.
Kietzmann, T., Spoerer, C., S
¨
orensen, K., Cichy, R., Hauk,
O., and Kriegeskorte, N. (2019). Recurrence is re-
quired to capture the representational dynamics of the
human visual system. Proceedings of the National
Academy of Sciences, 116:201905544.
Kotyan, S. and Vargas, D. V. (2021). Deep neural network
loses attention to adversarial images. arXiv preprint
arXiv:2106.05657.
Krizhevsky, A. (2009). Learning multiple layers of fea-
tures from tiny images. Technical report, University
of Toronto.
Liu, Y., Chen, X., Liu, C., and Song, D. (2016). Delving
into transferable adversarial examples and black-box
attacks. arXiv preprint arXiv:1611.02770.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and
Vladu, A. (2018). Towards deep learning models re-
sistant to adversarial attacks. In International Confer-
ence on Learning Representations.
Messina, N., Amato, G., Carrara, F., Gennaro, C., and
Falchi, F. (2022). Recurrent vision transformer for
solving visual reasoning problems. In International
Conference on Image Analysis and Processing, pages
50–61. Springer.
Nicolae, M.-I., Sinn, M., Tran, M. N., Buesser, B., Rawat,
A., Wistuba, M., Zantedeschi, V., Baracaldo, N.,
Chen, B., Ludwig, H., Molloy, I., and Edwards,
B. (2018). Adversarial robustness toolbox v1.2.0.
https://arxiv.org/pdf/1807.01069.
Parkhi, O. M., Vedaldi, A., Zisserman, A., and Jawahar,
C. V. (2012). Cats and dogs. In IEEE Conference
on Computer Vision and Pattern Recognition.
Rieger, L. and Hansen, L. K. (2020). A simple defense
against adversarial attacks on heatmap explanations.
arXiv preprint arXiv:2007.06381.
Stollenga, M. F., Masci, J., Gomez, F., and Schmidhuber,
J. (2014). Deep networks with internal selective at-
tention through feedback connections. Advances in
Neural Information Processing Systems, 27.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I., and Fergus, R. (2014). Intriguing
properties of neural networks. In International Con-
ference on Learning Representations.
Tram
`
er, F., Kurakin, A., Papernot, N., Goodfellow, I.,
Boneh, D., and McDaniel, P. (2018). Ensemble adver-
sarial training: Attacks and defenses. In International
Conference on Learning Representations.
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., and
Madry, A. (2019). Robustness may be at odds with
accuracy. In International Conference on Learning
Representations.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is all you need. Advances in Neural
Information Processing Systems, 30.
Vilone, G. and Longo, L. (2020). Explainable artificial
intelligence: a systematic review. arXiv preprint
arXiv:2006.00093.
Wightman, R. (2019). Pytorch image models. GitHub
repository.
Xu, K., Liu, S., Zhang, G., Sun, M., Zhao, P., Fan, Q., Gan,
C., and Lin, X. (2019). Interpreting adversarial exam-
ples by activation promotion and suppression. arXiv
preprint arXiv:1904.02057.
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
756