The Pros and Cons of Adversarial Robustness
Yacine Izza
1
and Joao Marques-Silva
2
1
CREATE, NUS, Singapore
2
ICREA, University of Lleida, Spain
Keywords:
Local & Global Robustness, Certified Robustness, Adversarial Examples, Explainable AI.
Abstract:
Robustness is widely regarded as a fundamental problem in the analysis of machine learning (ML) models.
Most often robustness equates with deciding the non-existence of adversarial examples, where adversarial ex-
amples denote situations where small changes on some inputs cause a change in the prediction. The perceived
importance of ML model robustness explains the continued progress observed for most of the last decade.
Whereas robustness is often assessed locally, i.e. given some target point in feature space, robustness can
also be defined globally, i.e. where any point in feature space can be considered. The importance of ML
model robustness is illustrated for example by the existing competition on neural network (NN) verification
(VNN-COMP), which assesses the progress of robustness tools for NNs, but also by efforts towards robustness
certification. More recently, robustness tools have also been used for computing rigorous explanations of ML
models. Despite the continued advances in robustness, this paper uncovers some limitations with existing def-
initions of robustness, both global and local, but also with efforts towards robustness certification. The paper
also investigates uses of adversarial examples besides those related with robustness.
1 INTRODUCTION
For more than a decade, Machine Learning (ML) has
been the subject of remarkable advances. However,
such advances have also been marred by a number of
persistent challenges. One of the best known of these
challenges is the brittleness of ML models (Hein and
Andriushchenko, 2017). An ML model is brittle if
it exhibits adversarial examples (AExs), i.e. small
changes to the inputs of the ML model can cause
(unexpected and unwanted) changes to the predic-
tion (Biggio et al., 2013; Szegedy et al., 2014; Good-
fellow et al., 2015). Intuitively, an ML model is robust
if it exhibits no AExs. ML model robustness has been
extensively studied over the last decade (Goodfellow
et al., 2015; Zhang et al., 2022b). The importance
of deciding the robustness of ML models motivated
an outpouring of competing approaches, ranging for
rather informal solutions, to those based on automated
reasoners, and even those based on domain-specific
reasoners. Furthermore, the significance of assess-
ing and asserting robustness is further highlighted by
VNN-COMP (Verification of Neural Networks Com-
petition (Brix et al., 2023b; Brix et al., 2023a)), a
competition for assessing robustness tools for NNs,
that has been running since 2020. In addition, there
are also efforts targeting the robustness certification
of ML models (Cohen et al., 2019; Rosenfeld et al.,
2020; Dvijotham et al., 2020; Huang et al., 2021;
Vor
´
acek and Hein, 2023; Carlini et al., 2023). Fur-
thermore, there have been proposals towards the veri-
fication and validation of systems based on AI (Seshia
et al., 2022), covering not only robustness, but also
explainability and fairness. The uses of systems of
AI in high-risk and safety-critical domains had moti-
vated calls for the use of so-called interpretable mod-
els (Rudin, 2019; Rudin et al., 2022), with the pur-
pose of enabling human-decision makers to explain
the decisions taken by such systems. Unfortunately,
such calls have not deterred proposal for the use of
complex systems of AI in high-risk and safety-critical
domains (Huang et al., 2020).
Robustness is often defined with respect to a con-
crete input to the ML model and its associated pre-
diction. In this case, one is referring to what is called
local robustness. An alternative view is global ro-
bustness, where the goal is to assess robustness for
any input of the ML model. Nevertheless, there exist
tools that target both local and global robustness (Katz
et al., 2017). Furthermore, to understand whether an
ML is locally/globally robust, it is also fundamental
to outline an adequate experimental setup.
386
Izza, Y. and Marques-Silva, J.
The Pros and Cons of Adversarial Robustness.
DOI: 10.5220/0013166300003890
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 17th International Conference on Agents and Artificial Intelligence (ICAART 2025) - Volume 2, pages 386-397
ISBN: 978-989-758-737-5; ISSN: 2184-433X
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
More recently, robustness has been linked with
formal approaches to explainable artificial intelli-
gence (XAI) (Wu et al., 2023; Huang and Marques-
Silva, 2023; Izza and Marques-Silva, 2024; Izza et al.,
2024). For example, it is the case that so-called
abductive explanations (Marques-Silva and Ignatiev,
2022; Marques-Silva, 2024) are such that no adver-
sarial examples can be identified.
This paper is in part motivated by a number
of negative results in explainability (Letoffe et al.,
2024; Marques-Silva and Huang, 2024; Huang and
Marques-Silva, 2024; Izza et al., 2022; Marques-
Silva and Ignatiev, 2023; Marques-Silva, 2023; Ig-
natiev, 2020; Marques-Silva and Ignatiev, 2022), but
targets instead robustness. Moreover, the paper ar-
gues that existing definitions of local and global ro-
bustness are problematic. Concretely the paper shows
that the experimental setup used for assessing robust-
ness is invariably inconclusive with respect to decid-
ing the robustness of the given ML model. In a simi-
lar fashion, the paper argues that efforts to deliver ro-
bustness certification can be ineffective. Motivated by
these results, the paper hypothesizes that there is no
simple solution to the basic shortcomings of existing
approaches for deciding the robustness of ML mod-
els. Nevertheless, the paper also underlines the im-
portance of robustness tools, not specifically for de-
ciding robustness, but instead as a key building block
for the computation of formal explanations by itera-
tively deciding the existence of adversarial examples.
A longer version of this paper is published on
arXiv (Izza and Marques-Silva, 2023), which includes
all proofs of the propositions and detailed results of
the experiments conducted in this work .
2 PRELIMINARIES
Measures of Distance. We consider the well-known
l
p
measure of distance,
||x y||
p
=
m
i=1
|x
i
y
i
|
p
1
/p
(1)
Which is referred as the Minkowski distance. Special
cases include the Manhattan distance l
1
, the Euclidean
distance l
2
and the Chebyshev distance l
, which is
defined by,
lim
p
||x y||
p
= max
1im
{|x
i
y
i
|} (2)
Moreover, l
0
denotes the Hamming distance, defined
by:
||x y||
0
=
m
i=1
ITE(x
i
= y
i
,0,1) (3)
Classification Problems. A classification problem is
defined on a set of features F = {1,...,m} and a set
of classes K = {c
1
,c
2
,...,c
K
}. Each feature i F
takes values from a domain D
i
. Domains can be cat-
egorical or ordinal. If ordinal, domains can be dis-
crete or real-valued. Throughout the paper, and unless
otherwise stated, domains will be assumed to be real-
valued. Feature space is defined by F = D
1
× D
2
×
...× D
m
. The notation x = (x
1
,...,x
m
) denotes an ar-
bitrary point in feature space, where each x
i
is a vari-
able taking values from D
i
. Moreover, the notation
v = (v
1
,...,v
m
) represents a specific point in feature
space, where each v
i
is a constant representing one
concrete value from D
i
. An instance denotes a pair
(v,c), where v F and c K . An ML classifier M
is characterized by a non-constant classification func-
tion κ that maps feature space F into the set of classes
K , i.e. κ : F K . Given the above, we associate with
a classifier M , a tuple (F , F, K ,κ). Since we assume
that κ is non-constant, then the ML classifier M is
declared non-trivial, i.e. (a,b F). (κ(a) ̸= κ(b)).
When reasoning about systems with formal meth-
ods, it is often assumed that all inputs are possible, or
alternatively, that there exist explicit constraints that
disallow some inputs. In contrast, reasoning methods
rooted in machine learning usually assume some input
distribution, which most often needs to be inferred (or
approximated) from training data or be user-specified.
We say that a point x is viable if, for reasoning pur-
poses, x must be accounted for. As a result, when
making statements about ML models, we consider
three possible scenarios on the inputs:
1. Unconstrained inputs, i.e. any point x in feature
space is viable.
2. Constrained inputs, i.e. any point x in feature
space is viable iff some set of constraints C is sat-
isfied by x. This is represented by the predicate
ξ(x).
3. Distribution-restricted inputs, i.e. any point x in
feature space is viable iff it respects some distri-
bution D i.e. x D, which is either user-specified
or it is inferred from training data.
Most work on adversarial robustness implicitly as-
sumes distribution-restricted inputs. In contrast,
throughout the paper, we assume the case of uncon-
strained inputs for the following main reasons. First,
the most rigorous approaches for robustness make
that implicit assumption. Second, assuming that in-
puts are distribution-restricted does not account for
data drift. Nevertheless, in cases where the distinction
matters, the paper also accounts for the other possible
cases on the inputs.
The Pros and Cons of Adversarial Robustness
387
2.1 Adversarial Robustness
Local Robustness. Given a classifier M with a clas-
sification function κ, and an instance (v,c), the clas-
sifier is locally robust (or just robust) for v if,
(x F). [||x v||
p
ε](κ(x) = κ(v)) (4)
If the classifier is not robust, then any point x F
satisfying the condition,
[||x v||
p
ε (κ(x) ̸= κ(v))] (5)
is referred to as an adversarial example for distance ε
(εAEx). (Observe that (5) consists of selecting one of
the counterexamples to (4).)
The definitions of local robustness in (4) and of
adversarial example in (5) assume unconstrained in-
puts. For completeness, we include the definitions of
local robustness for the other cases regarding assump-
tions on the inputs.
For constrained inputs we have,
(x F). [ξ(x) (||x v||
p
ε)](κ(x) = κ(v))
(6)
For distribution-restricted inputs we have,
S
ς
(x D).[||x v||
p
ε](κ(x) = κ(v)) (7)
where
S
ς
captures the sampling according to some
distribution and target confidence (ς), and where is
interpreted as a predicate, with two arguments x and
D, that holds iff x respects the distribution D.
There exist a multitude of proposed robustness
tools dedicated to local robustness, many of which
are regularly assessed in the VNN-COMP (Brix et al.,
2023b). Section 3 briefly overviews existing work on
robustness.
One additional observation is that tools that ex-
ploit automated reasoners assume unconstrained in-
puts, and so the definition of local robustness con-
sidered is (4). If information about the context in
which the ML models is to be deployed, then ξ(x)
may be available, and so it is to be expected that (6)
would be used instead. Finally, incomplete methods,
often based on the sampling of feature space, assume
the somewhat different definition (7), which can offer
probabilistic guarantees, but not absolute guarantees.
Certified Robustness. Earlier research have also
proposed certified robustness, which has been defined
as follows:
Definition 1 (From (Cohen et al., 2019)). A classi-
fier is said to be certifiably robust if for any input x,
one can easily obtain a guarantee that the classifier’s
prediction is constant within some set around x, often
an l
2
or l
ball”.
(We underscore that Definition 1 is taken verbatim
from (Cohen et al., 2019), although we highlight the
universal quantification on the inputs.) We will refer
to this definition throughout the paper.
Global Robustness. Given the definition of local ro-
bustness, a possible definition of global robustness is,
(v,x F). (||x v||
p
ε)(κ(x) = κ(v)) (8)
(Observe that (8) is just a formalization of Defini-
tion 1, by allowing the norm l
p
to be any.) Simi-
lar definitions have been studied in the literature (Se-
shia et al., 2018; Narodytska, 2018; Cohen et al.,
2019; Rosenfeld et al., 2020; Dvijotham et al., 2020;
Huang et al., 2021; Chen et al., 2021; Carlini et al.,
2023). For example, Reluplex (Katz et al., 2017) de-
fines global robustness by allowing small differences
in predictions. This alternative definition raises con-
cerns in classifications problems, because the differ-
ences between predicted values may be small, but the
predicted classes may be different. Moreover, there
are other variants of this definition, which the paper
also studies. However, by default we will assume this
definition throughout the paper. (As shown in the pa-
per, this apparently sensible definition actually raises
a number of critical issues.)
As shown in Section 3, other definitions of global
robustness can be related with the one proposed
above. Section 3 also briefly overviews approaches
for deciding global robustness, local robustness and
certified robustness.
Running Example. To motivate the claims in the pa-
per, the following very simple classifiers are used as
the running examples throughout the paper.
Example 1. A first classifier M
1
is defined on a sin-
gle feature F
1
= {1}, with D
11
= R. The set of classes
is K
1
= {0,1}, and the training data is given by:
{(0.0,0),(0.3,0),(0.4,0),(0.7,1),(1.0, 1)}. Further-
more, we use an off-the-shelf ML toolkit, e.g. scikit-
learn (Pedregosa et al., 2011), to learn the classifier’s
function κ
1
as a linear classifier κ
1
: D
11
K
1
. Ac-
cordingly, the model learned by scikit-learn is,
κ(x
1
) = ITE(0.93198992×x
1
0.64735516 0,1,0)
As can be observed, the accuracy of the learned clas-
sifier over training data is 100%. Moreover, the ques-
tion we seek to answer is: is the classifier (locally or
globally) robust?
Example 2. A second classifier M
2
is obtained from
the first one above (see Example 1), but defined as
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
388
follows:
κ
2
(x
1
,x
2
) =
(
κ
1
(x
1
) if x
1
1
ITE(x
1
> x
2
,1,0) otherwise
In this case, F
2
= {1, 2}, D
21
= D
22
= R, F = R ×R,
and K
2
= {0,1}.
Example 3 exemplifies the definition of adversar-
ial examples.
Example 3. For the classifier of Example 1, and for
the instance (0.7,1), from training data (and assum-
ing 100% accuracy on training data), it is apparent
that an AEx exists by setting ε = 0.3. By manual in-
spection of the learned model, we conclude that we
obtain an AEx with a smaller ε, e.g. ε = 0.1 suffices.
2.2 Symbolic Explainability
As mentioned in Section 1, the concepts of adver-
sarial examples and explainability are tightly related.
As a result, we include a brief introduction to sym-
bolic (or logic) explainability. More detailed accounts
are available (Marques-Silva and Ignatiev, 2022; Dar-
wiche, 2023; Marques-Silva, 2024).
An explanation problem E is a tuple (M , (v,c)),
where M is a classifier, and (v,c) is an instance.
When describing concepts in explainability, it is to
be understood an underlying explanation problem E.
Prime implicant (PI) explanations (Shih et al., 2018)
denote a minimal set of literals (relating a feature
value x
i
and a constant v
i
from its domain D
i
) that
are sufficient for the prediction. PI-explanations can
be formulated as a problem of logic-based abduc-
tion, and so are also referred to as abductive expla-
nations (AXp) (Ignatiev et al., 2019). Formally, given
v = (v
1
,...,v
m
) F with κ(v) = c, an AXp is any
minimal subset X F such that,
(x F).
h
^
iX
(x
i
= v
i
)
i
(κ(x) = c) (9)
AXps can be viewed as answering a ’Why?’ question,
i.e. why is some prediction made given some point in
feature space. A different view of explanations is a
contrastive explanation (Miller, 2019), which answers
a ’Why Not?’ question, i.e. which features can be
changed to change the prediction. A formal defini-
tion of contrastive explanation is proposed in recent
work (Ignatiev et al., 2020). Given E = (M ,(v,c)), a
CXp is any minimal subset Y F such that,
(x F).
^
iF \Y
(x
i
= v
i
) (κ(x) ̸= c) (10)
Note that any set Y F for which (10) holds but
not necessarly minimal is referred to as weak CXp.
Similarly, any X F for which (9) is a weak AXp.
The relationship between explanations and adver-
sarial examples can be further clarified (Izza et al.,
2024), as follows.
Proposition 1. Given an explanation problem E =
(M ,(v,c)), Y F is a weak CXp iff M has an εAEx,
with l
0
distance ε = |Y |.
Building on the results of R. Reiter in model-
based diagnosis (Reiter, 1987), (Ignatiev et al., 2020)
proves a minimal hitting set (MHS) duality relation
between AXps and CXps, i.e. AXps are MHSes of
CXps and vice-versa. Thus, as long as one can de-
vise logic encodings for an ML classifier (and this is
possible for most ML classifiers) and have access to
a suitable reasoner, then (9) and (10) offer a solution
for computing one AXp/CXp.
Distance-Restricted Explanations. With the pur-
pose of relating explanations with robustness, recent
work (Huang and Marques-Silva, 2023; Wu et al.,
2023) introduced the concept of distance-restricted
explanation.
Given an explanation problem E = (M ,(v,c))
and ε > 0, a distance-restricted AXp (εAXp) is a
subset-minimal set of features X F such that,
(x F).
h
^
iX
(x
i
= v
i
)
||x v||
p
ε](κ(x) = c) (11)
We define distance-restricted CXps (εCXps) accord-
ingly. A εCXp is a subset-minimal set of features
Y F , such that,
(x F).
h
^
iF \Y
(x
i
= v
i
)||x v||
p
ε
(κ(x) ̸= c)] (12)
MHS duality between distance-restricted AXps
and CXps has been proved, which enables the de-
velopment of algorithms for navigating the space of
εAXps and εCXps (Izza et al., 2024).
Finally, there is a simple relationship between
(distance-unrestricted or plain) AXps/CXps and
εAXps/εCXps. By picking l
0
, i.e. Hamming dis-
tance, and letting ε = m, i.e. the number of fea-
tures, then εAXps/εCXps represent exactly the (plain)
AXps/CXps. Observe that, by setting ε = m, we al-
low any subset of the features to be included/excluded
from AXps/CXps. Hence, we will be computing
distance-unrestricted AXps/CXps using algorithms
developed for εAXps/εCXps.
The Pros and Cons of Adversarial Robustness
389
3 RELATED WORK
The realization that ML models are most often brit-
tle (Biggio et al., 2013; Szegedy et al., 2014; Good-
fellow et al., 2015), i.e. that ML models can exhibit
adversarial examples, motivated a massive body of re-
search over the last decade on deciding the robustness
of ML models. The goal of this section is to briefly
overview works that are of special interest to the pa-
per’s topics, especially ML model robustness and the
identification of AExs. Moreover, a growing number
of surveys (Wiyatno et al., 2019; Zhang and Li, 2020;
Chen et al., 2020; Chakraborty et al., 2021; Rosen-
berg et al., 2022; Liang et al., 2022; Zhang et al.,
2022b; Zhou et al., 2023; Han et al., 2023) illustrate
the importance of robustness and AExs for the practi-
cal deployment of ML models.
Local Robustness. The definition of local robustness
proposed in most of past work matches the one used
in this paper (see (7)). Examples of tools that de-
cide local robustness are those evaluated in VNN-
COMP (Brix et al., 2023b).
Global Robustness. Past work considers global ro-
bustness as proposed in Eq. (8), which formal-
izes Definition 1. This is the case with (Seshia et al.,
2018; Chen et al., 2021), but also (Cohen et al., 2019;
Rosenfeld et al., 2020; Dvijotham et al., 2020; Huang
et al., 2021; Carlini et al., 2023). Some works pro-
pose a slightly modified definition of global robust-
ness (Katz et al., 2017):
(v,x F). (||x v||
p
ε)(|κ(x) κ(v)| δ)
(13)
where δ > 0. When compared with (8), the mod-
ified definition targets neural networks, especially
when these compute real-valued outputs. A num-
ber of works adopt this definition of global robust-
ness (Wang et al., 2022a; Wang et al., 2022b; Fu et al.,
2022), but (Fu et al., 2022) imposes no constraint on
x and v. It should be noted that this definition is not
without problems. For ML classifiers, e.g. image clas-
sification, conditions on the values of the outputs are
uninteresting, and so a definition similar to (8) must
be considered.
A different approach is adopted in (Ruan et al.,
2019) where global robustness is defined with re-
spect to a finite set of points in feature space, and
not all points in feature space. Yet another take on
global robustness is to reject inputs that are classified
as AEx (Leino et al., 2021; Baharlouei et al., 2023).
Thus a model can return class abstain on a given input
x. Finally, another line of research is represented by
DeepSafe, which finds safe regions where robustness
is guaranteed (Gopinath et al., 2018; Dimitrov et al.,
2022).
Robustness Certification. Work on certifying ro-
bustness can be traced to (Cohen et al., 2019; Weng
et al., 2019; Gehr et al., 2018; Singh et al., 2018)
and more recently (Rosenfeld et al., 2020; Dvijotham
et al., 2020; Huang et al., 2021; Vor
´
acek and Hein,
2023; Carlini et al., 2023) that leverage on local ro-
bustness property to provide certification and/or quan-
tify robustness of models against AExs. As illustra-
tion, empirical evaluation reported in (Carlini et al.,
2023) (resp. (Vor
´
acek and Hein, 2023)) considers a
collection of 100,000 and 10,000 (resp. 2000 and 500)
samples, respectively, drawn from CIFAR10 and Im-
ageNet datasets, that serve to certify robustness ac-
curacy, i.e. percentage of samples failed/succeeded
in the local robustness test. Besides, works reported
in (Liu et al., 2020; Wang et al., 2022b; Wang et al.,
2022a) adopt global robustness property to certify
whether or not the analyzed model is robust. Fur-
thermore, some works (Fu et al., 2022; Wang et al.,
2022a) use local and global robustness techniques to
measure lower and upper bounds for robustness. An-
other recent work (Dimitrov et al., 2022) computes
regions with robustness certification on all possible
points in these regions. In these earlier works, the im-
plications of global robustness on local robustness are
not discussed.
4 THE CONS OF ROBUSTNESS
This section proves a number of negative results re-
garding global and local robustness, but also regard-
ing robustness certification. More importantly, those
negative results impact the conclusions drawn in ear-
lier work on robustness. Nevertheless, this section
also discusses ways to cope with these negative re-
sults.
4.1 Basic Negative Results
4.1.1 Continuous Domains
There Is no Global Robustness. A straightforward
observation is that, given the proposed definition of
global robustness (see (8)), then there exist no non-
trivial globally robust classifiers.
Proposition 2. Any non-trivial classifier defined on
continuous (real-valued) features is not globally ro-
bust, independently of the value of ε chosen. (We as-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
390
sume that the measure of distance considered is l
p
,
with p 1.)
An observation that mimics Proposition 2 is made
in recent work (Leino et al., 2021). However, the con-
sequences of such observation were not investigated
further. Instead, a proposed solution to the problem
of global robustness was to change the training of the
classifier to return an indication of inputs that cannot
be guaranteed to be robust. Hence, the solution pro-
posed is to move from a posteriori deciding robust-
ness to training for robustness.
Example 4. With respect to Example 1, recall that the
model learned by scikit-learn is,
κ(x
1
) = ITE(0.93198992×x
1
0.64735516 0,1,0)
Clearly, the value of x
1
for which the predicted class
transitions from 0 to 1 is 0.69459459. Hence, exhibit-
ing point x
1
= 0.69459459 is a proof that the model is
not globally robust.
A key conclusion of the results above is that
the definition of (certified) robustness used in earlier
works (Cohen et al., 2019; Narodytska, 2018) (among
others, see also Section 2) can only be achieved with
an ML model that predicts a constant value; and this
of course unsatisfactory. Furthermore, the solution
proposed in other works (L
´
ecuyer et al., 2019), i.e. to
restricted robustness certification to a fixed test set, is
also unsatisfactory because validating robustness on
such a fixed test set defeats the purpose of machine
learning.
No Global Robustness Implies no Local Robust-
ness. One observation that stems from the proof
of Proposition 2 is that deciding local robustness is
also problematic. Concretely, if we are allowed to se-
lect a suitable point in feature space, then it is the case
that,
Proposition 3. For any non-trivial classifier defined
on real-valued features, there exists a point for which
the classifier is not locally robust, independently of
the value of ε chosen.
Example 5. With respect to Example 1, if we sam-
ple training data, then the model will be declared lo-
cally robust for ε < 0.00540541, due to point x
1
= 0.7.
Clearly, if we allow complete freedom on which val-
ues of x
1
to sample, we will then conclude that, for
x
1
= 0.69459459, robustness is non-existing for any
value of ε > 0, and so the classifier is not robust.
Observe that what Proposition 3 claims is that one
can always find points in feature space for which local
robustness does not hold. Thus, claiming local robust-
ness based on successful proving local robustness for
some selected points in feature space, does not equate
with global robustness holding in all points in feature
space.
The Robustness of Non-Trivial Classifiers Cannot
Be Certified. In recent years, a number of works
have studied robustness certification (Cohen et al.,
2019; Dvijotham et al., 2020; Rosenfeld et al., 2020;
Huang et al., 2021; Carlini et al., 2023).
1
Using the
definition of robustness certification proposed in Sec-
tion 2 (see Definition 1), which is taken verbatim
from (Cohen et al., 2019), then we can claim that,
Proposition 4. For any non-trivial classifier defined
on real-valued features, robustness cannot be certi-
fied, independently of the value of ε chosen.
Example 6. The analysis of Example 5 also demon-
strates that robustness certification will fail for the ex-
ample classifier, for any ε > 0, as long as the point
x
1
= 0.69459459 is analyzed. In contrast, if the train-
ing data were to be sampled for local robustness,
then one would declare robustness to be certified for
ε < 0.00540541. Evidently, such conclusion would be
in error.
Counterexamples to Local Robustness. Under the
assumption of unconstrained inputs, we proved earlier
in this section that no ML model is (locally) robust.
As a result, and given some ε > 0, counterexamples
to (unconstrained) local robustness are guaranteed to
exist, and can be obtained from any pair of points v,x
in feature space such that,
(x,v F). (||x v||
p
ε) (κ(x) ̸= κ(v)) (14)
Observe that requiring (x ̸= v) is unnecessary. Also,
we note that (14) is just the negation of (8), i.e. the
condition for global robustness. In addition, it is easy
to accommodate the cases where a classifier can flag
points as not being robust (Leino et al., 2021). (Earlier
work proposes that such points be flagged with a dif-
ferent class , but in this paper we use instead.) Fi-
nally, it is also plain to formulate the search for coun-
terexamples as a decision problem. Given some logic
formula ϕ and a target logic theory T , then JϕK
T
rep-
resents the encoding of ϕ in the target logic theory
T . Also, let SAT
T
represent a reasoner for theory T .
Then, we use (14) to formulate the decision problem
1
By being based on restricted forms of sample, the ap-
proaches used for robustness certification do not capture the
dictionary meaning of certification, i.e. to state in a formal
way that something is correct, and are fundamentally dif-
ferent from the meaning ascribed to certification in formal
methods.
The Pros and Cons of Adversarial Robustness
391
in theory T as follows:
SAT
T
(J(||x v||
p
ε) (κ(x) ̸= κ(v))K
T
) (15)
4.1.2 Discrete Domains
The negative results in Section 4.1.1 also hold in the
case of classifiers with discrete features. However, the
values of ε considered are further constrained. First,
we consider a classifier with categorical features, an
l
0
norm and unconstrained inputs. In this case, it is
guaranteed that one cannot have εAExs if ε < 1, i.e. if
one prevents any feature from changing value. If we
impose the constraint ε 1, then the results of Sec-
tion 4.1.1 hold, i) no non-trivial classifier is globally
robust; ii) for any non-trivial classifier there are points
in feature space that are not locally robust; and iii) ro-
bustness cannot be certified.
There are also settings where features are discrete
and result from discretizing real-valued features. This
is the case for example when using binarized neural
networks (BNNs) (Hubara et al., 2016). In these cases
we assume a discretization step δ. As a result, the
comments made for classifiers with categorical fea-
tures also apply in this case, with the constraint that
ε δ, i.e. the smallest distance to consider is no less
than the discretization step.
For example, for the experiments of Section 6,
when the l
0
norm is used, it is the case that δ = 1. As
a result, in the experiments we used ε = 1, thus target-
ing the smallest distance that could possibly be con-
sidered. As the results show, and as it should be ex-
pected, one can find counterexamples to local/global
robustness for all the experiments.
4.2 Practical Consequences
The results in Section 4.1.1 have important practical
consequences. First, the experimental setup most of-
ten used in the assessment of local robustness exhibits
critical shortcomings. In a significant body of earlier
work, local robustness is assessed by randomly sam-
pling feature space. This is the case with evaluations
of local robustness in (Brix et al., 2023b). This means
that, either sampling does not pick the right points in
feature space, and so one is allowed to (incorrectly)
decide for local robustness, or sampling picks one of
the points that must exist, and so (the expected) non
local robustness is decided. One paradigmatic exam-
ple is VNN-COMP(Brix et al., 2023b), where uni-
form random sampling has been employed in all the
previous competitions. The bottom line is that any
classifier declared local robust is guaranteed not to be
so. The same remarks apply in the case of global ro-
bustness, but in this case the number of existing works
is a fraction of the number of works on deciding local
robustness.
Examples of Ineffective Robustness Assessment.
One example of the limitations of randomly sam-
pling for assessing robustness is VNN-COMP (Brix
et al., 2023b; Brix et al., 2023a). The descrip-
tion of the most recent competitions (M
¨
uller et al.,
2022; Brix et al., 2023c; Brix et al., 2023a) con-
firms that robustness is assessed by randomly sam-
pling existing datasets. In these cases, any results
indicating that NNs are robust are necessarily incor-
rect. Furthermore, past works claiming robustness
certification are inaccurate. Methods based on au-
tomated reasoners, which assume unconstrained in-
puts, will be in error if ML models are declared ro-
bust. Distribution-restricted methods can only pro-
vide probabilistic guarantees. In such cases, one must
trust that the inferred input distribution faithfully cap-
tures possible inputs to the ML model. More impor-
tantly, one must also trust that the sampling methods
used will offer enough rigor.
Besides the shortcomings of local robustness and
certification, the limitations of existing methods for
attaining global robustness are also clear.
Practical Assessment of Shortcomings. As noted
earlier (15), in practice it is conceptually simple to
demonstrate the limitations of local robustness using a
dedicated automated reasoner. It suffices to decide the
existence of a point z in feature space which, for ar-
bitrarily small ε, it is the case that there exists a point
in the ε ball corresponding to a prediction other than
κ(z). Section 6 summarizes results illustrating not
only the existence of such points for complex clas-
sifiers, but also the practical scalability of this ap-
proach.
Existing Solutions. Some approaches avoid the
problems reported in this section by curbing the
claims about (certified) robustness. For example, the
definition of global robustness in some works (Ruan
et al., 2019) considers a finite set of points where local
robustness is assessed. As long as the inputs respect
such a finite set of points, then the non-existence of
AExs for robust ML models is guaranteed. Unfortu-
nately, if the inputs were to be known, then the need
for ML models would be non-existing. In a similar
vein, other tools (Gopinath et al., 2018) ensure robust-
ness in specific regions of feature space. Although
more general that (Ruan et al., 2019), similar limita-
tions apply.
Some works exploit sampling of the inputs ac-
cording to some inferred input distribution (Cohen
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
392
et al., 2019; Rosenfeld et al., 2020; Dvijotham et al.,
2020; Huang et al., 2021; Carlini et al., 2023). Such
works offer no formal guarantees of rigor, and so
claims of certification hold probabilistically assum-
ing that all possible inputs respect the assumed input
distribution.
4.3 Threats to Validity & Discussion
One possible criticism to the results in this section is
that some works on robustness do not assume uncon-
strained inputs, but instead sample the inputs accord-
ing to the distribution inferred from training data or
specific imposed distribution (Weng et al., 2019; Co-
hen et al., 2019; Yan et al., 2024). In such a situa-
tion, one can argue that both local and global robust-
ness can still be safely decided. Clearly, the claims of
rigor differ in the two cases. As the paper shows, ap-
proaches based on automated reasoners that consider
all possible inputs in feature space cannot provide
guarantees of robustness. However, the same applies
to any other approach when one seeks the strongest
guarantees of robustness. In addition, there is recent
work showing the importance of sampling out of the
distribution (Yin et al., 2019; Hendrycks et al., 2020;
Hendrycks et al., 2021; Zhang et al., 2022a), but this
again exhibits the shortcomings of local/global ro-
bustness identified earlier in this section.
Furthermore, the sampling of inputs according to
some inferred input distribution is not without clear
limitations. First, inferring an input distribution from
a negligible fraction of feature space can be a source
of error. Second, an input distribution declares inputs
more or less likely, but does not prevent the possibil-
ity, no matter how negligible, of inputs not in the dis-
tribution. Third, robustness tools based on automated
reasoners assume that all inputs are possible, i.e. un-
constrained inputs are assumed, when deciding local
robustness for a concrete point in feature space (Katz
et al., 2017). If input distributions were to be taken
into account for such tools when deciding the points
in feature space to analyze, then even for deciding lo-
cal robustness for a single point in feature space, input
distributions would have to be accounted for.
To the best of our knowledge, there is no clear way
on how to instrument automated reasoners to account
for input distributions. The key point is that sampling
according to the input distribution could be a source
of error, and so the integration with tools based on
automated reasoners is unclear.
A different approach is to explicitly assume that
some inputs are not accepted. For example, allow-
ing an elderly person to be of a fairly young age. To
the best of our knowledge, part work on robustness
does not account for disallowed inputs. In contrast,
the topic has been researched in formal explainabil-
ity (Gorji and Rubin, 2022; Yu et al., 2023), by con-
sidering constraints on the inputs. Future work may
re-analyze robustness in light of such constraints.
5 SOME PROS OF ROBUSTNESS
Under a scenario in which all inputs are possible, Sec-
tion 4 raises concerns about the usefulness of attempt-
ing to prove robustness, be it local or global. This
might seem to suggest the ongoing efforts towards de-
vising efficient and rigorous robustness tools are mis-
guided. Despite the negative results of the previous
section, robustness should be expected to find critical
applications in the near future. This section briefly
overviews one such application.
From εAExs to εAXps/εCXps. Recent work re-
vealed a tight relationship between formal ex-
planations and the non-existence of (constrained)
adversarial examples (Huang and Marques-Silva,
2023), enabled by the concept of distance-restricted
AXps/CXps (see definition in Page 4). Furthermore,
MHS duality between distance-restricted AXps and
CXps enable the navigation of the space of AXps
and CXps. More importantly, the computation of
εAXps/εCXps can be instrumented using tools for
finding εAExs.
Throughout this section, we assume that the exis-
tence of εAExs is decided by calls to a suitable ora-
cle. The procedure invoking the oracle is represented
by a predicate FindAEx(ε,Q ; E, p), parameterized by
the explanation problem E = (M , (v, x)), where M
is the classifier, and by the norm l
p
, and with argu-
ments ε > 0 and Q F . Furthermore, the predicate
FindAEx(ε,Q ; E , p) holds true if the ML classifier M
exhibits an AEx within distance ε, with the features in
Q fixed to the values dictated by v.
To illustrate the relationships between explana-
tions and adversarial examples, we propose a simple
linear search algorithm for computing one εCXp, de-
picted in Algorithm 1. (In the same vein, the linear
search can be adapted for computing one εAXp, as
shown in (Wu et al., 2023; Izza et al., 2024)). One ar-
gument is the value of ε, whereas the other argument
is a set of features that must be kept free (for CXps)
or fixed (for AXps) . The loop invariant is that S is a
weak εCXp (resp. εAXp). In the case of an εCXp, the
features in S , if freed, cause that an AEx can be identi-
fied (given the distance ε). In the case of an εAXp, the
features in S, if fixed, cause that no AEx can be iden-
tified (given the distance ε). Features are freed/fixed
The Pros and Cons of Adversarial Robustness
393
Algorithm 1: Linear algorithm to find CXp using AEx.
Input: Arguments: F , ε; Parameters: E, p
Output: One CXp S
1: function FindCXpDel(F , ε; E, p)
Inv: FindAEx(ε,F )
2: S F Initially, all features are free
3: for i F do
4: S S \{i} Fix feature i
5: outc = FindAEx(ε, F \ S;E, p)
6: if not outc then no AEx
7: S S {i} Free again feature i
8: return S
as long as the loop invariant is preserved. An oracle
for deciding the existence of an AEx is used to decide
whether or not the loop invariant is preserved when
another feature is freed/fixed. It can be shown that the
final set S is a εCXp/εAXp.
Navigating the Space of εAXps/εCXps. Besides
computing one distance-restricted explanation, one
may be interested in navigating the sets of distance-
restricted explanations. For example, we may be in-
terested in deciding whether a sensitive feature can
occur in some explanation, or we may just be inter-
ested in finding some other explanation when the re-
ported one is uninteresting.
New Insights into Explanations. For non-trivial
classifiers defined on real-values features, the fact that
local robustness does not hold on all points of feature
space reveals not only the guaranteed existence of ad-
versarial examples, but it also reveals new properties
about explanations. We discuss one such example.
Proposition 5. For a non-trivial classifier defined on
real-valued features, and for a measure l
p
, there exist
instances for which, for any ε > 0, there exist εAXps
and εCXps.
6 EXPERIMENTAL EVIDENCE
This section presents a summary of practical evidence
of our results on global robustness (and so indirectly
on the impossibility of global local robustness) for the
case study of dense NNs trained with image datasets.
The assessment is performed on a selection of 5
NNs examples, publicly available, used in robustness
formal verification. (Additional results on discrete
data (Binarized NNs) are included in the appendix
of (Izza and Marques-Silva, 2023).) Furthermore, the
experiments are conducted on a MacBook Pro with
a Dual-Core Intel Core i5 2.3GHz CPU with 8GByte
RAM running macOS Ventura. The time limit is set
3600 s and the memory limit is set to 16 Gb.
We implemented a formal global robustness veri-
fier for NNs in Python. Concretely, we generate two
copies of the neural network in ONNX format, one
replica that represents κ(x) and another one for κ(y)
and then encode the constraints on the input layer to
enforce ||xy||
p
ε and output layer to enforce them
to pick different classes, i.e. κ(x) ̸= κ(y). Moreover,
Marabou oracle
2
(Katz et al., 2019) is instrumented to
solve the robustness targeted problem.
Table 1: Assessment of global robustness verification for
deep NNs. The table shows results for 4 image datasets.
Model ε AEx Time
KJ TinyTaxiNet 0.1 0.069
KJ TinyTaxiNet 0.05 0.070
KJ TinyTaxiNet 0.001 0.113
MNIST-dense 0.1 0.897
MNIST-dense 0.05 0.899
MNIST-dense 0.001 1.805
MNIST-conv 0.1 TO
cifar-convSmall 0.1 TO
gtsrb-dense 0.1 42.535
gtsrb-dense 0.05 28.677
gtsrb-dense 0.001 50.556
Table 2: Detailed performance evaluation of computing
εCXp for DNNs. Columns Avg and nCalls in column AEx
report, resp. the average time and total number of instru-
mented oracle (AEx robustness) calls. Column Avg (resp.
Mn and Mx ) in column εCXp reports the average time
(resp. min and max) to deliver a εCXp and Len reports the
average length of the εCXps.
Model
AEx εCXp
ε Avg nCalls Len Mn Mx Avg
gtsrb 0.03 0.08 1023 218 74.2 87.4 77.1
mnist 0.08 0.41 464 360 131.0 355.6 188.3
Table 1 summarizes the results on deep neural net-
works, on different l
Chebyshev distance ε value
ranging from 0.001 to 0.1 for each considered clas-
sifier. As can be observed from the results, global
robustness formulation is able to identify adversarial
examples for all tests, with a few exception when the
neural network reasoner (i.e. Marabou) exceeds the
time limit. Clearly, our results confirm the theoret-
ical findings presented earlier that global robustness
query will always report an adversarial example for
2
Marabou is a complete neural network verifier pow-
ered with an SMT solver CVC4 (Barrett et al., 2011).
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
394
any non-trivial classifier.
Besides, we assess the performance-wise our ba-
sic algorithm for computing εCXps using AEx search.
Results of the evaluation on mnist and gtsrb bench-
marks are reported in Table 2. As can be seen from
the table, the average number of pixels in generated
εCXps is relatively small w.r.t. the image size of
considered data, e.g. 21% for gtsrb and 46% for
mnist, which illustrates the succinctness of the expla-
nations and subsequently more interpretable. More-
over, the average runtimes are less than 77.1 and
188.3 seconds (maximum 87.4 and 355.6 seconds),
resp., for gtsrb and mnist DNN, which demonstrates
the practical effectiveness of our method of image
classification benchmarks. The focus of our future
works will be on devising efficient algorithms to com-
pute smallest (cardinality) contrastive explanations.
7 CONCLUSIONS
This paper presents simple arguments that reveal the
shortcomings of deciding robustness, be it global or
local. Similarly, the paper uncovers related pitfalls of
attempts to certifying robustness, namely when inputs
are unconstrained. In addition, the paper also argues
that possible attempts at solving the identified short-
comings are not entirely satisfactory.
In contrast to the negative results presented in the
paper, the paper also details recently proposed uses
of robustness tools, building on the connections be-
tween adversarial examples and explainability. Fur-
thermore, the negative results on robustness are used
to shed light on the properties of distance-restricted
explanations of ML models. Future work will fur-
ther investigate the links between adversarial exam-
ples and symbolic explanations, e.g. smallest size
contrastive explanations will be investigated.
ACKNOWLEDGEMENTS
This work was supported in part by the National
Research Foundation, Prime Minister’s Office, Sin-
gapore under its Campus for Research Excellence
and Technological Enterprise (CREATE) programme,
by the Spanish government under grant PID2023-
152814OB-100, and by ICREA starting funds.
REFERENCES
Baharlouei, S., Sheikholeslami, F., Razaviyayn, M., and
Kolter, Z. (2023). Improving adversarial robustness
via joint classification and multiple explicit detection
classes. In AISTATS, pages 11059–11078.
Barrett, C. W., Conway, C. L., Deters, M., Hadarean, L.,
Jovanovic, D., King, T., Reynolds, A., and Tinelli, C.
(2011). CVC4. In CAV, pages 171–177.
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N.,
Laskov, P., Giacinto, G., and Roli, F. (2013). Eva-
sion attacks against machine learning at test time. In
ECML, pages 387–402.
Brix, C., Bak, S., Liu, C., and Johnson, T. T. (2023a). The
fourth international verification of neural networks
competition (VNN-COMP 2023): Summary and re-
sults. CoRR, abs/2312.16760.
Brix, C., M
¨
uller, M. N., Bak, S., Johnson, T. T., and Liu, C.
(2023b). First three years of the international verifica-
tion of neural networks competition (VNN-COMP).
Int. J. Softw. Tools Technol. Transf., 25(3):329–339.
Brix, C., M
¨
uller, M. N., Bak, S., Johnson, T. T., and Liu, C.
(2023c). First three years of the international verifica-
tion of neural networks competition (VNN-COMP).
CoRR, abs/2301.05815.
Carlini, N., Tram
`
er, F., Dvijotham, K. D., Rice, L., Sun,
M., and Kolter, J. Z. (2023). (certified!!) adversarial
robustness for free! In ICLR.
Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A.,
and Mukhopadhyay, D. (2021). A survey on adversar-
ial attacks and defences. CAAI Transactions on Intel-
ligence Technology, 6(1):25–45.
Chen, K., Zhu, H., Yan, L., and Wang, J. (2020). A survey
on adversarial examples in deep learning. Journal on
Big Data, 2(2):71.
Chen, Y., Wang, S., Qin, Y., Liao, X., Jana, S., and Wag-
ner, D. A. (2021). Learning security classifiers with
verified global robustness properties. In CCS, pages
477–494.
Cohen, J., Rosenfeld, E., and Kolter, J. Z. (2019). Certified
adversarial robustness via randomized smoothing. In
ICML, pages 1310–1320.
Darwiche, A. (2023). Logic for explainable AI. In LICS,
pages 1–11.
Dimitrov, D. I., Singh, G., Gehr, T., and Vechev, M. T.
(2022). Provably robust adversarial examples. In
ICLR.
Dvijotham, K. D., Hayes, J., Balle, B., Kolter, J. Z., Qin, C.,
Gy
¨
orgy, A., Xiao, K., Gowal, S., and Kohli, P. (2020).
A framework for robustness certification of smoothed
classifiers using F-divergences. In ICLR.
Fu, F., Wang, Z., Fan, J., Wang, Y., Huang, C., Chen, X.,
Zhu, Q., and Li, W. (2022). REGLO: Provable neu-
ral network repair for global robustness properties. In
Workshop on Trustworthy and Socially Responsible
Machine Learning, NeurIPS.
Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P.,
Chaudhuri, S., and Vechev, M. T. (2018). AI2: safety
and robustness certification of neural networks with
abstract interpretation. In IEEE S&P, pages 3–18.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-
plaining and harnessing adversarial examples. In
ICLR.
The Pros and Cons of Adversarial Robustness
395
Gopinath, D., Katz, G., Pasareanu, C. S., and Barrett, C. W.
(2018). DeepSafe: A data-driven approach for assess-
ing robustness of neural networks. In ATVA, pages
3–19.
Gorji, N. and Rubin, S. (2022). Sufficient reasons for clas-
sifier decisions in the presence of domain constraints.
In AAAI, pages 5660–5667.
Han, S., Lin, C., Shen, C., Wang, Q., and Guan, X. (2023).
Interpreting adversarial examples in deep learning: A
review. ACM Computing Surveys.
Hein, M. and Andriushchenko, M. (2017). Formal guaran-
tees on the robustness of a classifier against adversar-
ial manipulation. In NeurIPS, pages 2266–2276.
Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F.,
Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M.,
Song, D., Steinhardt, J., and Gilmer, J. (2021). The
many faces of robustness: A critical analysis of out-
of-distribution generalization. In ICCV, pages 8320–
8329.
Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krish-
nan, R., and Song, D. (2020). Pretrained transformers
improve out-of-distribution robustness. In ACL, pages
2744–2751.
Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y.,
Thamo, E., Wu, M., and Yi, X. (2020). A survey of
safety and trustworthiness of deep neural networks:
Verification, testing, adversarial attack and defence,
and interpretability. Comput. Sci. Rev., 37:100270.
Huang, X. and Marques-Silva, J. (2023). From ro-
bustness to explainability and back again. CoRR,
abs/2306.03048.
Huang, X. and Marques-Silva, J. (2024). On the failings
of shapley values for explainability. Int. J. Approx.
Reason., page 109112.
Huang, Y., Zhang, H., Shi, Y., Kolter, J. Z., and Anand-
kumar, A. (2021). Training certifiably robust neu-
ral networks with efficient local lipschitz bounds. In
NeurIPS, pages 22745–22757.
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and
Bengio, Y. (2016). Binarized neural networks. In
NeurIPS, pages 4107–4115.
Ignatiev, A. (2020). Towards trustable explainable AI. In
IJCAI, pages 5154–5158.
Ignatiev, A., Narodytska, N., Asher, N., and Marques-Silva,
J. (2020). From contrastive to abductive explanations
and back again. In AIxIA, pages 335–355.
Ignatiev, A., Narodytska, N., and Marques-Silva, J. (2019).
Abduction-based explanations for machine learning
models. In AAAI, pages 1511–1519.
Izza, Y., Huang, X., Morgado, A., Planes, J., Ignatiev, A.,
and Marques-Silva, J. (2024). Distance-Restricted
Explanations: Theoretical Underpinnings & Efficient
Implementation. In KR, pages 475–486.
Izza, Y., Ignatiev, A., and Marques-Silva, J. (2022). On
tackling explanation redundancy in decision trees. J.
Artif. Intell. Res., 75:261–321.
Izza, Y. and Marques-Silva, J. (2023). The pros and cons of
adversarial robustness. CoRR, abs/2312.10911.
Izza, Y. and Marques-Silva, J. (2024). Efficient contrastive
explanations on demand. CoRR.
Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochen-
derfer, M. J. (2017). Reluplex: An efficient SMT
solver for verifying deep neural networks. In CAV,
pages 97–117.
Katz, G., Huang, D. A., Ibeling, D., Julian, K., Lazarus, C.,
Lim, R., Shah, P., Thakoor, S., Wu, H., Zeljic, A., Dill,
D. L., Kochenderfer, M. J., and Barrett, C. W. (2019).
The marabou framework for verification and analysis
of deep neural networks. In CAV, pages 443–452.
L
´
ecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and
Jana, S. (2019). Certified robustness to adversarial ex-
amples with differential privacy. In IEEE S&P, pages
656–672.
Leino, K., Wang, Z., and Fredrikson, M. (2021). Globally-
robust neural networks. In ICML, pages 6212–6222.
Letoffe, O., Huang, X., and Marques-Silva, J. (2024). On
correcting SHAP scores. In AAAI.
Liang, H., He, E., Zhao, Y., Jia, Z., and Li, H. (2022). Ad-
versarial attack and defense: A survey. Electronics,
11(8):1283.
Liu, X., Han, X., Zhang, N., and Liu, Q. (2020). Certified
monotonic neural networks. In NeurIPS.
Marques-Silva, J. (2023). Disproving XAI myths with for-
mal methods - initial results. In ICECCS, pages 12–
21.
Marques-Silva, J. (2024). Logic-based explainability: Past,
present and future. In ISoLA, pages 181–204.
Marques-Silva, J. and Huang, X. (2024). Explainability is
not a game. Commun. ACM, pages 66–75.
Marques-Silva, J. and Ignatiev, A. (2022). Delivering trust-
worthy AI through formal XAI. In AAAI, pages
12342–12350.
Marques-Silva, J. and Ignatiev, A. (2023). No silver bullet:
interpretable ML models must be explained. Frontiers
in Artificial Intelligence, 6:1128212.
Miller, T. (2019). Explanation in artificial intelligence: In-
sights from the social sciences. Artif. Intell., 267:1–
38.
M
¨
uller, M. N., Brix, C., Bak, S., Liu, C., and Johnson, T. T.
(2022). The third international verification of neural
networks competition (VNN-COMP 2022): Summary
and results. CoRR, abs/2212.10376.
Narodytska, N. (2018). Formal analysis of deep binarized
neural networks. In Lang, J., editor, Proceedings of
the Twenty-Seventh International Joint Conference on
Artificial Intelligence, IJCAI 2018, July 13-19, 2018,
Stockholm, Sweden, pages 5692–5696. ijcai.org.
Pedregosa, F. et al. (2011). Scikit-learn: Machine learning
in Python. Journal of Machine Learning Research,
12:2825–2830.
Reiter, R. (1987). A theory of diagnosis from first princi-
ples. Artif. Intell., 32(1):57–95.
Rosenberg, I., Shabtai, A., Elovici, Y., and Rokach, L.
(2022). Adversarial machine learning attacks and de-
fense methods in the cyber security domain. ACM
Comput. Surv., 54(5):108:1–108:36.
Rosenfeld, E., Winston, E., Ravikumar, P., and Kolter, J. Z.
(2020). Certified robustness to label-flipping attacks
via randomized smoothing. In ICML, pages 8230–
8241.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
396
Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., and
Kwiatkowska, M. (2019). Global robustness evalu-
ation of deep neural networks with provable guaran-
tees for the hamming distance. In IJCAI, pages 5944–
5952.
Rudin, C. (2019). Stop explaining black box machine learn-
ing models for high stakes decisions and use inter-
pretable models instead. Nature Machine Intelligence,
1(5):206–215.
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L.,
and Zhong, C. (2022). Interpretable machine learn-
ing: Fundamental principles and 10 grand challenges.
Statistics Surveys, 16:1–85.
Seshia, S. A., Desai, A., Dreossi, T., Fremont, D. J., Ghosh,
S., Kim, E., Shivakumar, S., Vazquez-Chanlatte, M.,
and Yue, X. (2018). Formal specification for deep neu-
ral networks. In ATVA, pages 20–34.
Seshia, S. A., Sadigh, D., and Sastry, S. S. (2022). To-
ward verified artificial intelligence. Commun. ACM,
65(7):46–55.
Shih, A., Choi, A., and Darwiche, A. (2018). A symbolic
approach to explaining bayesian network classifiers.
In IJCAI, pages 5103–5111.
Singh, G., Gehr, T., Mirman, M., P
¨
uschel, M., and Vechev,
M. T. (2018). Fast and effective robustness certifica-
tion. In NeurIPS, pages 10825–10836.
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
D., Goodfellow, I. J., and Fergus, R. (2014). Intriguing
properties of neural networks. In ICLR.
Vor
´
acek, V. and Hein, M. (2023). Improving l1-certified
robustness via randomized smoothing by leveraging
box constraints. In ICML, pages 35198–35222.
Wang, Z., Huang, C., and Zhu, Q. (2022a). Efficient
global robustness certification of neural networks via
interleaving twin-network encoding. In DATE, pages
1087–1092.
Wang, Z., Wang, Y., Fu, F., Jiao, R., Huang, C., Li, W.,
and Zhu, Q. (2022b). A tool for neural network
global robustness certification and training. CoRR,
abs/2208.07289.
Weng, L., Chen, P., Nguyen, L. M., Squillante, M. S.,
Boopathy, A., Oseledets, I. V., and Daniel, L. (2019).
PROVEN: verifying robustness of neural networks
with a probabilistic approach. In ICML, pages 6727–
6736.
Wiyatno, R. R., Xu, A., Dia, O., and de Berker, A. (2019).
Adversarial examples in modern machine learning: A
review. CoRR, abs/1911.05268.
Wu, M., Wu, H., and Barrett, C. W. (2023). Verix: To-
wards verified explainability of deep neural networks.
In NeurIPS.
Yan, G., Romano, Y., and Weng, T. (2024). Provably ro-
bust conformal prediction with improved efficiency.
In ICLR.
Yin, D., Lopes, R. G., Shlens, J., Cubuk, E. D., and Gilmer,
J. (2019). A fourier perspective on model robustness
in computer vision. In NeurIPS, pages 13255–13265.
Yu, J., Ignatiev, A., Stuckey, P. J., Narodytska, N., and
Marques-Silva, J. (2023). Eliminating the impossible,
whatever remains must be true: On extracting and ap-
plying background knowledge in the context of formal
explanations. In AAAI, pages 4123–4131.
Zhang, J. and Li, C. (2020). Adversarial examples: Oppor-
tunities and challenges. IEEE Trans. Neural Networks
Learn. Syst., 31(7):2578–2593.
Zhang, M., Levine, S., and Finn, C. (2022a). MEMO: test
time robustness via adaptation and augmentation. In
NeurIPS.
Zhang, X., Zheng, X., and Mao, W. (2022b). Adversarial
perturbation defense on deep neural networks. ACM
Comput. Surv., 54(8):159:1–159:36.
Zhou, S., Liu, C., Ye, D., Zhu, T., Zhou, W., and Yu,
P. S. (2023). Adversarial attacks and defenses in deep
learning: From a perspective of cybersecurity. ACM
Comput. Surv., 55(8):163:1–163:39.
The Pros and Cons of Adversarial Robustness
397