drop caused by adversarial examples is still severe on
the more complex models and warrants actual robust
training techniques.
Time and computational constraints are the main
limitations of this study. This study is performed by
an individual researcher utilizing Google Colab
instances with Tesla T4 GPUs, which necessitated
choices such as utilizing 32x32 color images from
CIFAR-10 and the ResNet-18 architecture, which
ensures training performance but might cause the
resulting model to have difficulties with generalizing
across different AIGC models. In particular, 32x32
pixels is a significantly lower resolution than the
majority of natural and T2I AI-generated images
currently available on the internet. This may be the
cause of the lower prediction accuracy as there is less
space for AI generation artifacts, like the ones
hypothesized by Sha et al. to manifest (Z. Sha et al.,
2022). However, increasing the resolution of the
training samples or model depth would cause a
multiplicative increase in all training costs, including
T2I image generation, adversarial attacks, and general
model training. If there are fewer resource constraints,
investigating the interplay of adversarial training,
cross-attention-based ensemble models, and higher
resolution samples is a promising future area, as both
latter factors are shown to improve the natural
accuracy of models (J. Bird Jordan, and L. Ahmad,
2023), (W. Quan et al., 2020).
5 CONCLUSION
This study focuses on the problem of the adversarial
robustness of models that detect AI-generated images.
It aims to (1) evaluate the adversarial robustness of
existing models and (2) construct a model that can
achieve a higher degree of robustness against
adversarial attacks that are either gradient or spatial
transformation-based. For purpose (1), several state-
of-the-art AIGC detection models are evaluated
against both PGD attacks and adversarial translations
and rotations. Both attacks are proven to be highly
effective at reducing the classification accuracy of all
models. For purpose (2), adversarial training and data
is utilized along with a convolutional image classifier
model, which has an improved degree of robustness
against both kinds of adversarial attacks while
preserving the accuracy of the base classifier.
In conclusion, this study proves the susceptibility
of CNN-based AIGC detection models to adversarial
attacks and the possibility of enhancing these models’
robustness with adversarial training. As AIGC
technology continues to improve and proliferate at an
unprecedented pace, AI-based classification
technology might be the best solution for combating
their abuse. Based on this paper’s results, future
models that detect AIGC should also take the issue of
adversarial robustness in consideration, especially
when it comes to distinguishing between what is real
and what is fake.
REFERENCES
Z. Sha, Z. Li, N. Yu, and Zhang, Y, De-fake: Detection and
attribution of fake images generated by text-to-image
diffusion models, arXiv preprint arXiv: 2210.06998,
2022.
J. Bird Jordan, and L. Ahmad, CIFAKE: Image Classifica-
tion and Explainable Identification of AI-Generated
Synthetic Images, arXiv preprint arXiv: 2303.14126,
2023.
Z. Xi, W. Huang, K. Wei, W. Luo, and P. Zheng, AI-Gener-
ated Image Detection using a Cross-Attention Enhanced
Dual-Stream Network, arXiv preprint arXiv:
2306.07005, 2023,
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A.
Vladu, Towards deep learning models resistant to adver-
sarial attacks, arXiv preprint arXiv:1706.06083, 2017.
L. Engstrom, B. Tran, D. Tsipras, L. Schmidt, and A. Madry,
Exploring the landscape of spatial robustness, In Inter-
national conference on machine learning, PMLR, 2019,
pp. 1802-1811.
M. I. Nicolae, and M. Sinn, Adversarial Robustness Toolbox
v1.2.0. CoRR, 1807.01069, arXiv preprint arXiv:
1807.01069, 2018.
I. J. Goodfellow, J. Shlens, and C. Szegedy, Explaining and
harnessing adversarial examples, arXiv preprint
arXiv:1412.6572, 2014.
F. Croce, and M. Hein, Reliable evaluation of adversarial
robustness with an ensemble of diverse parameter-free
attacks, In International conference on machine learning,
2022, pp. 2206-2216.
W. Quan, K. Wang, D. M. Yan, X. Zhang, and D. Pellerin,
Learn with diversity and from harder samples: Improv-
ing the generalization of CNN-Based detection of com-
puter-generated images, Forensic Science International:
Digital Investigation, 2020, vol. 35, pp. 301023.
E. Wong, L. Rice, and J. Z. Kolter, Fast is better than free:
Revisiting adversarial training, arXiv preprint
arXiv:2001.03994, 2020.
R. Wang, L. Ma, F. Juefei-Xu, X. Xie, J. Wang, and Y. Liu,
Fakespotter: A simple baseline for spotting AI-synthe-
sized fake faces, arXiv preprint arXiv:1909.06122, 2019.
Towards Adversarially Robust AI-Generated Image Detection
385