The Pros and Cons of Adversarial Robustness

Yacine Izza

and Joao Marques-Silva

CREATE, NUS, Singapore

ICREA, University of Lleida, Spain

Keywords:

Local & Global Robustness, Certiﬁed Robustness, Adversarial Examples, Explainable AI.

Abstract:

Robustness is widely regarded as a fundamental problem in the analysis of machine learning (ML) models.

Most often robustness equates with deciding the non-existence of adversarial examples, where adversarial ex-

amples denote situations where small changes on some inputs cause a change in the prediction. The perceived

importance of ML model robustness explains the continued progress observed for most of the last decade.

Whereas robustness is often assessed locally, i.e. given some target point in feature space, robustness can

also be deﬁned globally, i.e. where any point in feature space can be considered. The importance of ML

model robustness is illustrated for example by the existing competition on neural network (NN) veriﬁcation

(VNN-COMP), which assesses the progress of robustness tools for NNs, but also by efforts towards robustness

certiﬁcation. More recently, robustness tools have also been used for computing rigorous explanations of ML

models. Despite the continued advances in robustness, this paper uncovers some limitations with existing def-

initions of robustness, both global and local, but also with efforts towards robustness certiﬁcation. The paper

also investigates uses of adversarial examples besides those related with robustness.

1 INTRODUCTION

For more than a decade, Machine Learning (ML) has

been the subject of remarkable advances. However,

such advances have also been marred by a number of

persistent challenges. One of the best known of these

challenges is the brittleness of ML models (Hein and

Andriushchenko, 2017). An ML model is brittle if

it exhibits adversarial examples (AExs), i.e. small

changes to the inputs of the ML model can cause

(unexpected and unwanted) changes to the predic-

tion (Biggio et al., 2013; Szegedy et al., 2014; Good-

fellow et al., 2015). Intuitively, an ML model is robust

if it exhibits no AExs. ML model robustness has been

extensively studied over the last decade (Goodfellow

et al., 2015; Zhang et al., 2022b). The importance

of deciding the robustness of ML models motivated

an outpouring of competing approaches, ranging for

rather informal solutions, to those based on automated

reasoners, and even those based on domain-speciﬁc

reasoners. Furthermore, the signiﬁcance of assess-

ing and asserting robustness is further highlighted by

VNN-COMP (Veriﬁcation of Neural Networks Com-

petition (Brix et al., 2023b; Brix et al., 2023a)), a

competition for assessing robustness tools for NNs,

that has been running since 2020. In addition, there

are also efforts targeting the robustness certiﬁcation

of ML models (Cohen et al., 2019; Rosenfeld et al.,

2020; Dvijotham et al., 2020; Huang et al., 2021;

Vor

acek and Hein, 2023; Carlini et al., 2023). Fur-

thermore, there have been proposals towards the veri-

ﬁcation and validation of systems based on AI (Seshia

et al., 2022), covering not only robustness, but also

explainability and fairness. The uses of systems of

AI in high-risk and safety-critical domains had moti-

vated calls for the use of so-called interpretable mod-

els (Rudin, 2019; Rudin et al., 2022), with the pur-

pose of enabling human-decision makers to explain

the decisions taken by such systems. Unfortunately,

such calls have not deterred proposal for the use of

complex systems of AI in high-risk and safety-critical

domains (Huang et al., 2020).

Robustness is often deﬁned with respect to a con-

crete input to the ML model and its associated pre-

diction. In this case, one is referring to what is called

local robustness. An alternative view is global ro-

bustness, where the goal is to assess robustness for

any input of the ML model. Nevertheless, there exist

tools that target both local and global robustness (Katz

et al., 2017). Furthermore, to understand whether an

ML is locally/globally robust, it is also fundamental

to outline an adequate experimental setup.

386

Izza, Y. and Marques-Silva, J.

The Pros and Cons of Adversarial Robustness.

DOI: 10.5220/0013166300003890

In Proceedings of the 17th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2025) - Volume 2, pages 386-397

ISBN: 978-989-758-737-5; ISSN: 2184-433X

More recently, robustness has been linked with

formal approaches to explainable artiﬁcial intelli-

gence (XAI) (Wu et al., 2023; Huang and Marques-

Silva, 2023; Izza and Marques-Silva, 2024; Izza et al.,

2024). For example, it is the case that so-called

abductive explanations (Marques-Silva and Ignatiev,

2022; Marques-Silva, 2024) are such that no adver-

sarial examples can be identiﬁed.

This paper is in part motivated by a number

of negative results in explainability (Letoffe et al.,

2024; Marques-Silva and Huang, 2024; Huang and

Marques-Silva, 2024; Izza et al., 2022; Marques-

Silva and Ignatiev, 2023; Marques-Silva, 2023; Ig-

natiev, 2020; Marques-Silva and Ignatiev, 2022), but

targets instead robustness. Moreover, the paper ar-

gues that existing deﬁnitions of local and global ro-

bustness are problematic. Concretely the paper shows

that the experimental setup used for assessing robust-

ness is invariably inconclusive with respect to decid-

ing the robustness of the given ML model. In a simi-

lar fashion, the paper argues that efforts to deliver ro-

bustness certiﬁcation can be ineffective. Motivated by

these results, the paper hypothesizes that there is no

simple solution to the basic shortcomings of existing

approaches for deciding the robustness of ML mod-

els. Nevertheless, the paper also underlines the im-

portance of robustness tools, not speciﬁcally for de-

ciding robustness, but instead as a key building block

for the computation of formal explanations by itera-

tively deciding the existence of adversarial examples.

A longer version of this paper is published on

arXiv (Izza and Marques-Silva, 2023), which includes

all proofs of the propositions and detailed results of

the experiments conducted in this work .

2 PRELIMINARIES

Measures of Distance. We consider the well-known

measure of distance,

||x − y||



∑

i=1

− y



(1)

Which is referred as the Minkowski distance. Special

cases include the Manhattan distance l

, the Euclidean

distance l

and the Chebyshev distance l

∞

, which is

deﬁned by,

lim

p→∞

||x − y||

= max

1≤i≤m

{|x

− y

|} (2)

Moreover, l

denotes the Hamming distance, deﬁned

by:

||x − y||

∑

i=1

ITE(x

= y

,0,1) (3)

Classiﬁcation Problems. A classiﬁcation problem is

deﬁned on a set of features F = {1,...,m} and a set

of classes K = {c

,...,c

}. Each feature i ∈ F

takes values from a domain D

. Domains can be cat-

egorical or ordinal. If ordinal, domains can be dis-

crete or real-valued. Throughout the paper, and unless

otherwise stated, domains will be assumed to be real-

valued. Feature space is deﬁned by F = D

× D

...× D

. The notation x = (x

,...,x

) denotes an ar-

bitrary point in feature space, where each x

is a vari-

able taking values from D

. Moreover, the notation

v = (v

,...,v

) represents a speciﬁc point in feature

space, where each v

is a constant representing one

concrete value from D

. An instance denotes a pair

(v,c), where v ∈ F and c ∈ K . An ML classiﬁer M

is characterized by a non-constant classiﬁcation func-

tion κ that maps feature space F into the set of classes

K , i.e. κ : F → K . Given the above, we associate with

a classiﬁer M , a tuple (F , F, K ,κ). Since we assume

that κ is non-constant, then the ML classiﬁer M is

declared non-trivial, i.e. ∃(a,b ∈ F). (κ(a) ̸= κ(b)).

When reasoning about systems with formal meth-

ods, it is often assumed that all inputs are possible, or

alternatively, that there exist explicit constraints that

disallow some inputs. In contrast, reasoning methods

rooted in machine learning usually assume some input

distribution, which most often needs to be inferred (or

approximated) from training data or be user-speciﬁed.

We say that a point x is viable if, for reasoning pur-

poses, x must be accounted for. As a result, when

making statements about ML models, we consider

three possible scenarios on the inputs:

1. Unconstrained inputs, i.e. any point x in feature

space is viable.

2. Constrained inputs, i.e. any point x in feature

space is viable iff some set of constraints C is sat-

isﬁed by x. This is represented by the predicate

ξ(x).

3. Distribution-restricted inputs, i.e. any point x in

feature space is viable iff it respects some distri-

bution D i.e. x ∼ D, which is either user-speciﬁed

or it is inferred from training data.

Most work on adversarial robustness implicitly as-

sumes distribution-restricted inputs. In contrast,

throughout the paper, we assume the case of uncon-

strained inputs for the following main reasons. First,

the most rigorous approaches for robustness make

that implicit assumption. Second, assuming that in-

puts are distribution-restricted does not account for

data drift. Nevertheless, in cases where the distinction

matters, the paper also accounts for the other possible

cases on the inputs.

The Pros and Cons of Adversarial Robustness

387

2.1 Adversarial Robustness

Local Robustness. Given a classiﬁer M with a clas-

siﬁcation function κ, and an instance (v,c), the clas-

siﬁer is locally robust (or just robust) for v if,

∀(x ∈ F). [||x − v||

≤ ε]→(κ(x) = κ(v)) (4)

If the classiﬁer is not robust, then any point x ∈ F

satisfying the condition,

[||x − v||

≤ ε ∧ (κ(x) ̸= κ(v))] (5)

is referred to as an adversarial example for distance ε

(εAEx). (Observe that (5) consists of selecting one of

the counterexamples to (4).)

The deﬁnitions of local robustness in (4) and of

adversarial example in (5) assume unconstrained in-

puts. For completeness, we include the deﬁnitions of

local robustness for the other cases regarding assump-

tions on the inputs.

For constrained inputs we have,

∀(x ∈ F). [ξ(x) ∧ (||x − v||

≤ ε)]→(κ(x) = κ(v))

(6)

For distribution-restricted inputs we have,

(x ∼ D).[||x − v||

≤ ε]→(κ(x) = κ(v)) (7)

where

captures the sampling according to some

distribution and target conﬁdence (ς), and where ∼ is

interpreted as a predicate, with two arguments x and

D, that holds iff x respects the distribution D.

There exist a multitude of proposed robustness

tools dedicated to local robustness, many of which

are regularly assessed in the VNN-COMP (Brix et al.,

2023b). Section 3 brieﬂy overviews existing work on

robustness.

One additional observation is that tools that ex-

ploit automated reasoners assume unconstrained in-

puts, and so the deﬁnition of local robustness con-

sidered is (4). If information about the context in

which the ML models is to be deployed, then ξ(x)

may be available, and so it is to be expected that (6)

would be used instead. Finally, incomplete methods,

often based on the sampling of feature space, assume

the somewhat different deﬁnition (7), which can offer

probabilistic guarantees, but not absolute guarantees.

Certiﬁed Robustness. Earlier research have also

proposed certiﬁed robustness, which has been deﬁned

as follows:

Deﬁnition 1 (From (Cohen et al., 2019)). “A classi-

ﬁer is said to be certiﬁably robust if for any input x,

one can easily obtain a guarantee that the classiﬁer’s

prediction is constant within some set around x, often

an l

or l

∞

ball”.

(We underscore that Deﬁnition 1 is taken verbatim

from (Cohen et al., 2019), although we highlight the

universal quantiﬁcation on the inputs.) We will refer

to this deﬁnition throughout the paper.

Global Robustness. Given the deﬁnition of local ro-

bustness, a possible deﬁnition of global robustness is,

∀(v,x ∈ F). (||x − v||

≤ ε)→(κ(x) = κ(v)) (8)

(Observe that (8) is just a formalization of Deﬁni-

tion 1, by allowing the norm l

to be any.) Simi-

lar deﬁnitions have been studied in the literature (Se-

shia et al., 2018; Narodytska, 2018; Cohen et al.,

2019; Rosenfeld et al., 2020; Dvijotham et al., 2020;

Huang et al., 2021; Chen et al., 2021; Carlini et al.,

2023). For example, Reluplex (Katz et al., 2017) de-

ﬁnes global robustness by allowing small differences

in predictions. This alternative deﬁnition raises con-

cerns in classiﬁcations problems, because the differ-

ences between predicted values may be small, but the

predicted classes may be different. Moreover, there

are other variants of this deﬁnition, which the paper

also studies. However, by default we will assume this

deﬁnition throughout the paper. (As shown in the pa-

per, this apparently sensible deﬁnition actually raises

a number of critical issues.)

As shown in Section 3, other deﬁnitions of global

robustness can be related with the one proposed

above. Section 3 also brieﬂy overviews approaches

for deciding global robustness, local robustness and

certiﬁed robustness.

Running Example. To motivate the claims in the pa-

per, the following very simple classiﬁers are used as

the running examples throughout the paper.

Example 1. A ﬁrst classiﬁer M

is deﬁned on a sin-

gle feature F

= {1}, with D

= R. The set of classes

is K

= {0,1}, and the training data is given by:

{(0.0,0),(0.3,0),(0.4,0),(0.7,1),(1.0, 1)}. Further-

more, we use an off-the-shelf ML toolkit, e.g. scikit-

learn (Pedregosa et al., 2011), to learn the classiﬁer’s

function κ

as a linear classiﬁer κ

: D

→ K

. Ac-

cordingly, the model learned by scikit-learn is,

κ(x

) = ITE(0.93198992×x

−0.64735516 ≥ 0,1,0)

As can be observed, the accuracy of the learned clas-

siﬁer over training data is 100%. Moreover, the ques-

tion we seek to answer is: is the classiﬁer (locally or

globally) robust?

Example 2. A second classiﬁer M

is obtained from

the ﬁrst one above (see Example 1), but deﬁned as

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

388

follows:

) =

(

) if x

≤ 1

ITE(x

> x

,1,0) otherwise

In this case, F

= {1, 2}, D

= D

= R, F = R ×R,

and K

= {0,1}.

Example 3 exempliﬁes the deﬁnition of adversar-

ial examples.

Example 3. For the classiﬁer of Example 1, and for

the instance (0.7,1), from training data (and assum-

ing 100% accuracy on training data), it is apparent

that an AEx exists by setting ε = 0.3. By manual in-

spection of the learned model, we conclude that we

obtain an AEx with a smaller ε, e.g. ε = 0.1 sufﬁces.

2.2 Symbolic Explainability

As mentioned in Section 1, the concepts of adver-

sarial examples and explainability are tightly related.

As a result, we include a brief introduction to sym-

bolic (or logic) explainability. More detailed accounts

are available (Marques-Silva and Ignatiev, 2022; Dar-

wiche, 2023; Marques-Silva, 2024).

An explanation problem E is a tuple (M , (v,c)),

where M is a classiﬁer, and (v,c) is an instance.

When describing concepts in explainability, it is to

be understood an underlying explanation problem E.

Prime implicant (PI) explanations (Shih et al., 2018)

denote a minimal set of literals (relating a feature

value x

and a constant v

from its domain D

) that

are sufﬁcient for the prediction. PI-explanations can

be formulated as a problem of logic-based abduc-

tion, and so are also referred to as abductive expla-

nations (AXp) (Ignatiev et al., 2019). Formally, given

v = (v

,...,v

) ∈ F with κ(v) = c, an AXp is any

minimal subset X ⊆ F such that,

∀(x ∈ F).

i∈X

= v

)

→(κ(x) = c) (9)

AXps can be viewed as answering a ’Why?’ question,

i.e. why is some prediction made given some point in

feature space. A different view of explanations is a

contrastive explanation (Miller, 2019), which answers

a ’Why Not?’ question, i.e. which features can be

changed to change the prediction. A formal deﬁni-

tion of contrastive explanation is proposed in recent

work (Ignatiev et al., 2020). Given E = (M ,(v,c)), a

CXp is any minimal subset Y ⊆ F such that,

∃(x ∈ F).

i∈F \Y

= v

) ∧ (κ(x) ̸= c) (10)

Note that any set Y ⊆ F for which (10) holds but

not necessarly minimal is referred to as weak CXp.

Similarly, any X ⊆ F for which (9) is a weak AXp.

The relationship between explanations and adver-

sarial examples can be further clariﬁed (Izza et al.,

2024), as follows.

Proposition 1. Given an explanation problem E =

(M ,(v,c)), Y ⊆ F is a weak CXp iff M has an εAEx,

with l

distance ε = |Y |.

Building on the results of R. Reiter in model-

based diagnosis (Reiter, 1987), (Ignatiev et al., 2020)

proves a minimal hitting set (MHS) duality relation

between AXps and CXps, i.e. AXps are MHSes of

CXps and vice-versa. Thus, as long as one can de-

vise logic encodings for an ML classiﬁer (and this is

possible for most ML classiﬁers) and have access to

a suitable reasoner, then (9) and (10) offer a solution

for computing one AXp/CXp.

Distance-Restricted Explanations. With the pur-

pose of relating explanations with robustness, recent

work (Huang and Marques-Silva, 2023; Wu et al.,

2023) introduced the concept of distance-restricted

explanation.

Given an explanation problem E = (M ,(v,c))

and ε > 0, a distance-restricted AXp (εAXp) is a

subset-minimal set of features X ⊆ F such that,

∀(x ∈ F).

i∈X

= v

)∧

||x − v||

≤ ε]→(κ(x) = c) (11)

We deﬁne distance-restricted CXps (εCXps) accord-

ingly. A εCXp is a subset-minimal set of features

Y ⊆ F , such that,

∃(x ∈ F).

i∈F \Y

= v

)∧||x − v||

≤ ε

∧(κ(x) ̸= c)] (12)

MHS duality between distance-restricted AXps

and CXps has been proved, which enables the de-

velopment of algorithms for navigating the space of

εAXps and εCXps (Izza et al., 2024).

Finally, there is a simple relationship between

(distance-unrestricted or plain) AXps/CXps and

εAXps/εCXps. By picking l

, i.e. Hamming dis-

tance, and letting ε = m, i.e. the number of fea-

tures, then εAXps/εCXps represent exactly the (plain)

AXps/CXps. Observe that, by setting ε = m, we al-

low any subset of the features to be included/excluded

from AXps/CXps. Hence, we will be computing

distance-unrestricted AXps/CXps using algorithms

developed for εAXps/εCXps.

The Pros and Cons of Adversarial Robustness

389

3 RELATED WORK

The realization that ML models are most often brit-

tle (Biggio et al., 2013; Szegedy et al., 2014; Good-

fellow et al., 2015), i.e. that ML models can exhibit

adversarial examples, motivated a massive body of re-

search over the last decade on deciding the robustness

of ML models. The goal of this section is to brieﬂy

overview works that are of special interest to the pa-

per’s topics, especially ML model robustness and the

identiﬁcation of AExs. Moreover, a growing number

of surveys (Wiyatno et al., 2019; Zhang and Li, 2020;

Chen et al., 2020; Chakraborty et al., 2021; Rosen-

berg et al., 2022; Liang et al., 2022; Zhang et al.,

2022b; Zhou et al., 2023; Han et al., 2023) illustrate

the importance of robustness and AExs for the practi-

cal deployment of ML models.

Local Robustness. The deﬁnition of local robustness

proposed in most of past work matches the one used

in this paper (see (7)). Examples of tools that de-

cide local robustness are those evaluated in VNN-

COMP (Brix et al., 2023b).

Global Robustness. Past work considers global ro-

bustness as proposed in Eq. (8), which formal-

izes Deﬁnition 1. This is the case with (Seshia et al.,

2018; Chen et al., 2021), but also (Cohen et al., 2019;

Rosenfeld et al., 2020; Dvijotham et al., 2020; Huang

et al., 2021; Carlini et al., 2023). Some works pro-

pose a slightly modiﬁed deﬁnition of global robust-

ness (Katz et al., 2017):

∀(v,x ∈ F). (||x − v||

≤ ε)→(|κ(x) − κ(v)| ≤ δ)

(13)

where δ > 0. When compared with (8), the mod-

iﬁed deﬁnition targets neural networks, especially

when these compute real-valued outputs. A num-

ber of works adopt this deﬁnition of global robust-

ness (Wang et al., 2022a; Wang et al., 2022b; Fu et al.,

2022), but (Fu et al., 2022) imposes no constraint on

x and v. It should be noted that this deﬁnition is not

without problems. For ML classiﬁers, e.g. image clas-

siﬁcation, conditions on the values of the outputs are

uninteresting, and so a deﬁnition similar to (8) must

be considered.

A different approach is adopted in (Ruan et al.,

2019) where global robustness is deﬁned with re-

spect to a ﬁnite set of points in feature space, and

not all points in feature space. Yet another take on

global robustness is to reject inputs that are classiﬁed

as AEx (Leino et al., 2021; Baharlouei et al., 2023).

Thus a model can return class abstain on a given input

x. Finally, another line of research is represented by

DeepSafe, which ﬁnds safe regions where robustness

is guaranteed (Gopinath et al., 2018; Dimitrov et al.,

2022).

Robustness Certiﬁcation. Work on certifying ro-

bustness can be traced to (Cohen et al., 2019; Weng

et al., 2019; Gehr et al., 2018; Singh et al., 2018)

and more recently (Rosenfeld et al., 2020; Dvijotham

et al., 2020; Huang et al., 2021; Vor

acek and Hein,

2023; Carlini et al., 2023) that leverage on local ro-

bustness property to provide certiﬁcation and/or quan-

tify robustness of models against AExs. As illustra-

tion, empirical evaluation reported in (Carlini et al.,

2023) (resp. (Vor

acek and Hein, 2023)) considers a

collection of 100,000 and 10,000 (resp. 2000 and 500)

samples, respectively, drawn from CIFAR10 and Im-

ageNet datasets, that serve to certify robustness ac-

curacy, i.e. percentage of samples failed/succeeded

in the local robustness test. Besides, works reported

in (Liu et al., 2020; Wang et al., 2022b; Wang et al.,

2022a) adopt global robustness property to certify

whether or not the analyzed model is robust. Fur-

thermore, some works (Fu et al., 2022; Wang et al.,

2022a) use local and global robustness techniques to

measure lower and upper bounds for robustness. An-

other recent work (Dimitrov et al., 2022) computes

regions with robustness certiﬁcation on all possible

points in these regions. In these earlier works, the im-

plications of global robustness on local robustness are

not discussed.

4 THE CONS OF ROBUSTNESS

This section proves a number of negative results re-

garding global and local robustness, but also regard-

ing robustness certiﬁcation. More importantly, those

negative results impact the conclusions drawn in ear-

lier work on robustness. Nevertheless, this section

also discusses ways to cope with these negative re-

sults.

4.1 Basic Negative Results

4.1.1 Continuous Domains

There Is no Global Robustness. A straightforward

observation is that, given the proposed deﬁnition of

global robustness (see (8)), then there exist no non-

trivial globally robust classiﬁers.

Proposition 2. Any non-trivial classiﬁer deﬁned on

continuous (real-valued) features is not globally ro-

bust, independently of the value of ε chosen. (We as-

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

390

sume that the measure of distance considered is l

with p ≥ 1.)

An observation that mimics Proposition 2 is made

in recent work (Leino et al., 2021). However, the con-

sequences of such observation were not investigated

further. Instead, a proposed solution to the problem

of global robustness was to change the training of the

classiﬁer to return an indication of inputs that cannot

be guaranteed to be robust. Hence, the solution pro-

posed is to move from a posteriori deciding robust-

ness to training for robustness.

Example 4. With respect to Example 1, recall that the

model learned by scikit-learn is,

κ(x

) = ITE(0.93198992×x

−0.64735516 ≥ 0,1,0)

Clearly, the value of x

for which the predicted class

transitions from 0 to 1 is 0.69459459. Hence, exhibit-

ing point x

= 0.69459459 is a proof that the model is

not globally robust.

A key conclusion of the results above is that

the deﬁnition of (certiﬁed) robustness used in earlier

works (Cohen et al., 2019; Narodytska, 2018) (among

others, see also Section 2) can only be achieved with

an ML model that predicts a constant value; and this

of course unsatisfactory. Furthermore, the solution

proposed in other works (L

ecuyer et al., 2019), i.e. to

restricted robustness certiﬁcation to a ﬁxed test set, is

also unsatisfactory because validating robustness on

such a ﬁxed test set defeats the purpose of machine

learning.

No Global Robustness Implies no Local Robust-

ness. One observation that stems from the proof

of Proposition 2 is that deciding local robustness is

also problematic. Concretely, if we are allowed to se-

lect a suitable point in feature space, then it is the case

that,

Proposition 3. For any non-trivial classiﬁer deﬁned

on real-valued features, there exists a point for which

the classiﬁer is not locally robust, independently of

the value of ε chosen.

Example 5. With respect to Example 1, if we sam-

ple training data, then the model will be declared lo-

cally robust for ε < 0.00540541, due to point x

= 0.7.

Clearly, if we allow complete freedom on which val-

ues of x

to sample, we will then conclude that, for

= 0.69459459, robustness is non-existing for any

value of ε > 0, and so the classiﬁer is not robust.

Observe that what Proposition 3 claims is that one

can always ﬁnd points in feature space for which local

robustness does not hold. Thus, claiming local robust-

ness based on successful proving local robustness for

some selected points in feature space, does not equate

with global robustness holding in all points in feature

space.

The Robustness of Non-Trivial Classiﬁers Cannot

Be Certiﬁed. In recent years, a number of works

have studied robustness certiﬁcation (Cohen et al.,

2019; Dvijotham et al., 2020; Rosenfeld et al., 2020;

Huang et al., 2021; Carlini et al., 2023).

Using the

deﬁnition of robustness certiﬁcation proposed in Sec-

tion 2 (see Deﬁnition 1), which is taken verbatim

from (Cohen et al., 2019), then we can claim that,

Proposition 4. For any non-trivial classiﬁer deﬁned

on real-valued features, robustness cannot be certi-

ﬁed, independently of the value of ε chosen.

Example 6. The analysis of Example 5 also demon-

strates that robustness certiﬁcation will fail for the ex-

ample classiﬁer, for any ε > 0, as long as the point

= 0.69459459 is analyzed. In contrast, if the train-

ing data were to be sampled for local robustness,

then one would declare robustness to be certiﬁed for

ε < 0.00540541. Evidently, such conclusion would be

in error.

Counterexamples to Local Robustness. Under the

assumption of unconstrained inputs, we proved earlier

in this section that no ML model is (locally) robust.

As a result, and given some ε > 0, counterexamples

to (unconstrained) local robustness are guaranteed to

exist, and can be obtained from any pair of points v,x

in feature space such that,

∃(x,v ∈ F). (||x − v||

≤ ε) ∧ (κ(x) ̸= κ(v)) (14)

Observe that requiring (x ̸= v) is unnecessary. Also,

we note that (14) is just the negation of (8), i.e. the

condition for global robustness. In addition, it is easy

to accommodate the cases where a classiﬁer can ﬂag

points as not being robust (Leino et al., 2021). (Earlier

work proposes that such points be ﬂagged with a dif-

ferent class ⊥, but in this paper we use ⊘ instead.) Fi-

nally, it is also plain to formulate the search for coun-

terexamples as a decision problem. Given some logic

formula ϕ and a target logic theory T , then JϕK

rep-

resents the encoding of ϕ in the target logic theory

T . Also, let SAT

represent a reasoner for theory T .

Then, we use (14) to formulate the decision problem

By being based on restricted forms of sample, the ap-

proaches used for robustness certiﬁcation do not capture the

dictionary meaning of certiﬁcation, i.e. to state in a formal

way that something is correct, and are fundamentally dif-

ferent from the meaning ascribed to certiﬁcation in formal

methods.

The Pros and Cons of Adversarial Robustness

391

in theory T as follows:

SAT

(J(||x − v||

≤ ε) ∧ (κ(x) ̸= κ(v))K

) (15)

4.1.2 Discrete Domains

The negative results in Section 4.1.1 also hold in the

case of classiﬁers with discrete features. However, the

values of ε considered are further constrained. First,

we consider a classiﬁer with categorical features, an

norm and unconstrained inputs. In this case, it is

guaranteed that one cannot have εAExs if ε < 1, i.e. if

one prevents any feature from changing value. If we

impose the constraint ε ≥ 1, then the results of Sec-

tion 4.1.1 hold, i) no non-trivial classiﬁer is globally

robust; ii) for any non-trivial classiﬁer there are points

in feature space that are not locally robust; and iii) ro-

bustness cannot be certiﬁed.

There are also settings where features are discrete

and result from discretizing real-valued features. This

is the case for example when using binarized neural

networks (BNNs) (Hubara et al., 2016). In these cases

we assume a discretization step δ. As a result, the

comments made for classiﬁers with categorical fea-

tures also apply in this case, with the constraint that

ε ≥ δ, i.e. the smallest distance to consider is no less

than the discretization step.

For example, for the experiments of Section 6,

when the l

norm is used, it is the case that δ = 1. As

a result, in the experiments we used ε = 1, thus target-

ing the smallest distance that could possibly be con-

sidered. As the results show, and as it should be ex-

pected, one can ﬁnd counterexamples to local/global

robustness for all the experiments.

4.2 Practical Consequences

The results in Section 4.1.1 have important practical

consequences. First, the experimental setup most of-

ten used in the assessment of local robustness exhibits

critical shortcomings. In a signiﬁcant body of earlier

work, local robustness is assessed by randomly sam-

pling feature space. This is the case with evaluations

of local robustness in (Brix et al., 2023b). This means

that, either sampling does not pick the right points in

feature space, and so one is allowed to (incorrectly)

decide for local robustness, or sampling picks one of

the points that must exist, and so (the expected) non

local robustness is decided. One paradigmatic exam-

ple is VNN-COMP(Brix et al., 2023b), where uni-

form random sampling has been employed in all the

previous competitions. The bottom line is that any

classiﬁer declared local robust is guaranteed not to be

so. The same remarks apply in the case of global ro-

bustness, but in this case the number of existing works

is a fraction of the number of works on deciding local

robustness.

Examples of Ineffective Robustness Assessment.

One example of the limitations of randomly sam-

pling for assessing robustness is VNN-COMP (Brix

et al., 2023b; Brix et al., 2023a). The descrip-

tion of the most recent competitions (M

uller et al.,

2022; Brix et al., 2023c; Brix et al., 2023a) con-

ﬁrms that robustness is assessed by randomly sam-

pling existing datasets. In these cases, any results

indicating that NNs are robust are necessarily incor-

rect. Furthermore, past works claiming robustness

certiﬁcation are inaccurate. Methods based on au-

tomated reasoners, which assume unconstrained in-

puts, will be in error if ML models are declared ro-

bust. Distribution-restricted methods can only pro-

vide probabilistic guarantees. In such cases, one must

trust that the inferred input distribution faithfully cap-

tures possible inputs to the ML model. More impor-

tantly, one must also trust that the sampling methods

used will offer enough rigor.

Besides the shortcomings of local robustness and

certiﬁcation, the limitations of existing methods for

attaining global robustness are also clear.

Practical Assessment of Shortcomings. As noted

earlier (15), in practice it is conceptually simple to

demonstrate the limitations of local robustness using a

dedicated automated reasoner. It sufﬁces to decide the

existence of a point z in feature space which, for ar-

bitrarily small ε, it is the case that there exists a point

in the ε ball corresponding to a prediction other than

κ(z). Section 6 summarizes results illustrating not

only the existence of such points for complex clas-

siﬁers, but also the practical scalability of this ap-

proach.

Existing Solutions. Some approaches avoid the

problems reported in this section by curbing the

claims about (certiﬁed) robustness. For example, the

deﬁnition of global robustness in some works (Ruan

et al., 2019) considers a ﬁnite set of points where local

robustness is assessed. As long as the inputs respect

such a ﬁnite set of points, then the non-existence of

AExs for robust ML models is guaranteed. Unfortu-

nately, if the inputs were to be known, then the need

for ML models would be non-existing. In a similar

vein, other tools (Gopinath et al., 2018) ensure robust-

ness in speciﬁc regions of feature space. Although

more general that (Ruan et al., 2019), similar limita-

tions apply.

Some works exploit sampling of the inputs ac-

cording to some inferred input distribution (Cohen

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

392

et al., 2019; Rosenfeld et al., 2020; Dvijotham et al.,

2020; Huang et al., 2021; Carlini et al., 2023). Such

works offer no formal guarantees of rigor, and so

claims of certiﬁcation hold probabilistically assum-

ing that all possible inputs respect the assumed input

distribution.

4.3 Threats to Validity & Discussion

One possible criticism to the results in this section is

that some works on robustness do not assume uncon-

strained inputs, but instead sample the inputs accord-

ing to the distribution inferred from training data or

speciﬁc imposed distribution (Weng et al., 2019; Co-

hen et al., 2019; Yan et al., 2024). In such a situa-

tion, one can argue that both local and global robust-

ness can still be safely decided. Clearly, the claims of

rigor differ in the two cases. As the paper shows, ap-

proaches based on automated reasoners that consider

all possible inputs in feature space cannot provide

guarantees of robustness. However, the same applies

to any other approach when one seeks the strongest

guarantees of robustness. In addition, there is recent

work showing the importance of sampling out of the

distribution (Yin et al., 2019; Hendrycks et al., 2020;

Hendrycks et al., 2021; Zhang et al., 2022a), but this

again exhibits the shortcomings of local/global ro-

bustness identiﬁed earlier in this section.

Furthermore, the sampling of inputs according to

some inferred input distribution is not without clear

limitations. First, inferring an input distribution from

a negligible fraction of feature space can be a source

of error. Second, an input distribution declares inputs

more or less likely, but does not prevent the possibil-

ity, no matter how negligible, of inputs not in the dis-

tribution. Third, robustness tools based on automated

reasoners assume that all inputs are possible, i.e. un-

constrained inputs are assumed, when deciding local

robustness for a concrete point in feature space (Katz

et al., 2017). If input distributions were to be taken

into account for such tools when deciding the points

in feature space to analyze, then even for deciding lo-

cal robustness for a single point in feature space, input

distributions would have to be accounted for.

To the best of our knowledge, there is no clear way

on how to instrument automated reasoners to account

for input distributions. The key point is that sampling

according to the input distribution could be a source

of error, and so the integration with tools based on

automated reasoners is unclear.

A different approach is to explicitly assume that

some inputs are not accepted. For example, allow-

ing an elderly person to be of a fairly young age. To

the best of our knowledge, part work on robustness

does not account for disallowed inputs. In contrast,

the topic has been researched in formal explainabil-

ity (Gorji and Rubin, 2022; Yu et al., 2023), by con-

sidering constraints on the inputs. Future work may

re-analyze robustness in light of such constraints.

5 SOME PROS OF ROBUSTNESS

Under a scenario in which all inputs are possible, Sec-

tion 4 raises concerns about the usefulness of attempt-

ing to prove robustness, be it local or global. This

might seem to suggest the ongoing efforts towards de-

vising efﬁcient and rigorous robustness tools are mis-

guided. Despite the negative results of the previous

section, robustness should be expected to ﬁnd critical

applications in the near future. This section brieﬂy

overviews one such application.

From εAExs to εAXps/εCXps. Recent work re-

vealed a tight relationship between formal ex-

planations and the non-existence of (constrained)

adversarial examples (Huang and Marques-Silva,

2023), enabled by the concept of distance-restricted

AXps/CXps (see deﬁnition in Page 4). Furthermore,

MHS duality between distance-restricted AXps and

CXps enable the navigation of the space of AXps

and CXps. More importantly, the computation of

εAXps/εCXps can be instrumented using tools for

ﬁnding εAExs.

Throughout this section, we assume that the exis-

tence of εAExs is decided by calls to a suitable ora-

cle. The procedure invoking the oracle is represented

by a predicate FindAEx(ε,Q ; E, p), parameterized by

the explanation problem E = (M , (v, x)), where M

is the classiﬁer, and by the norm l

, and with argu-

ments ε > 0 and Q ⊆ F . Furthermore, the predicate

FindAEx(ε,Q ; E , p) holds true if the ML classiﬁer M

exhibits an AEx within distance ε, with the features in

Q ﬁxed to the values dictated by v.

To illustrate the relationships between explana-

tions and adversarial examples, we propose a simple

linear search algorithm for computing one εCXp, de-

picted in Algorithm 1. (In the same vein, the linear

search can be adapted for computing one εAXp, as

shown in (Wu et al., 2023; Izza et al., 2024)). One ar-

gument is the value of ε, whereas the other argument

is a set of features that must be kept free (for CXps)

or ﬁxed (for AXps) . The loop invariant is that S is a

weak εCXp (resp. εAXp). In the case of an εCXp, the

features in S , if freed, cause that an AEx can be identi-

ﬁed (given the distance ε). In the case of an εAXp, the

features in S, if ﬁxed, cause that no AEx can be iden-

tiﬁed (given the distance ε). Features are freed/ﬁxed

The Pros and Cons of Adversarial Robustness

393

Algorithm 1: Linear algorithm to ﬁnd CXp using AEx.

Input: Arguments: F , ε; Parameters: E, p

Output: One CXp S

1: function FindCXpDel(F , ε; E, p) ▷

Inv: FindAEx(ε,F )

2: S ← F ▷ Initially, all features are free

3: for i ∈ F do

4: S ← S \{i} ▷ Fix feature i

5: outc = FindAEx(ε, F \ S;E, p)

6: if not outc then ▷ no AEx

7: S ← S ∪{i} ▷ Free again feature i

8: return S

as long as the loop invariant is preserved. An oracle

for deciding the existence of an AEx is used to decide

whether or not the loop invariant is preserved when

another feature is freed/ﬁxed. It can be shown that the

ﬁnal set S is a εCXp/εAXp.

Navigating the Space of εAXps/εCXps. Besides

computing one distance-restricted explanation, one

may be interested in navigating the sets of distance-

restricted explanations. For example, we may be in-

terested in deciding whether a sensitive feature can

occur in some explanation, or we may just be inter-

ested in ﬁnding some other explanation when the re-

ported one is uninteresting.

New Insights into Explanations. For non-trivial

classiﬁers deﬁned on real-values features, the fact that

local robustness does not hold on all points of feature

space reveals not only the guaranteed existence of ad-

versarial examples, but it also reveals new properties

about explanations. We discuss one such example.

Proposition 5. For a non-trivial classiﬁer deﬁned on

real-valued features, and for a measure l

, there exist

instances for which, for any ε > 0, there exist εAXps

and εCXps.

6 EXPERIMENTAL EVIDENCE

This section presents a summary of practical evidence

of our results on global robustness (and so indirectly

on the impossibility of global local robustness) for the

case study of dense NNs trained with image datasets.

The assessment is performed on a selection of 5

NNs examples, publicly available, used in robustness

formal veriﬁcation. (Additional results on discrete

data (Binarized NNs) are included in the appendix

of (Izza and Marques-Silva, 2023).) Furthermore, the

experiments are conducted on a MacBook Pro with

a Dual-Core Intel Core i5 2.3GHz CPU with 8GByte

RAM running macOS Ventura. The time limit is set

3600 s and the memory limit is set to 16 Gb.

We implemented a formal global robustness veri-

ﬁer for NNs in Python. Concretely, we generate two

copies of the neural network in ONNX format, one

replica that represents κ(x) and another one for κ(y)

and then encode the constraints on the input layer to

enforce ||x−y||

≤ ε and output layer to enforce them

to pick different classes, i.e. κ(x) ̸= κ(y). Moreover,

Marabou oracle

(Katz et al., 2019) is instrumented to

solve the robustness targeted problem.

Table 1: Assessment of global robustness veriﬁcation for

deep NNs. The table shows results for 4 image datasets.

Model ε AEx Time

KJ TinyTaxiNet 0.1 ✓ 0.069

KJ TinyTaxiNet 0.05 ✓ 0.070

KJ TinyTaxiNet 0.001 ✓ 0.113

MNIST-dense 0.1 ✓ 0.897

MNIST-dense 0.05 ✓ 0.899

MNIST-dense 0.001 ✓ 1.805

MNIST-conv 0.1 — TO

cifar-convSmall 0.1 — TO

gtsrb-dense 0.1 ✓ 42.535

gtsrb-dense 0.05 ✓ 28.677

gtsrb-dense 0.001 ✓ 50.556

Table 2: Detailed performance evaluation of computing

εCXp for DNNs. Columns Avg and nCalls in column AEx

report, resp. the average time and total number of instru-

mented oracle (AEx robustness) calls. Column Avg (resp.

Mn and Mx ) in column εCXp reports the average time

(resp. min and max) to deliver a εCXp and Len reports the

average length of the εCXps.

Model

AEx εCXp

ε Avg nCalls Len Mn Mx Avg

gtsrb 0.03 0.08 1023 218 74.2 87.4 77.1

mnist 0.08 0.41 464 360 131.0 355.6 188.3

Table 1 summarizes the results on deep neural net-

works, on different l

∞

Chebyshev distance ε value

ranging from 0.001 to 0.1 for each considered clas-

siﬁer. As can be observed from the results, global

robustness formulation is able to identify adversarial

examples for all tests, with a few exception when the

neural network reasoner (i.e. Marabou) exceeds the

time limit. Clearly, our results conﬁrm the theoret-

ical ﬁndings presented earlier that global robustness

query will always report an adversarial example for

Marabou is a complete neural network veriﬁer pow-

ered with an SMT solver CVC4 (Barrett et al., 2011).

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

394

any non-trivial classiﬁer.

Besides, we assess the performance-wise our ba-

sic algorithm for computing εCXps using AEx search.

Results of the evaluation on mnist and gtsrb bench-

marks are reported in Table 2. As can be seen from

the table, the average number of pixels in generated

εCXps is relatively small w.r.t. the image size of

considered data, e.g. ∼21% for gtsrb and ∼46% for

mnist, which illustrates the succinctness of the expla-

nations and subsequently more interpretable. More-

over, the average runtimes are less than 77.1 and

188.3 seconds (maximum 87.4 and 355.6 seconds),

resp., for gtsrb and mnist DNN, which demonstrates

the practical effectiveness of our method of image

classiﬁcation benchmarks. The focus of our future

works will be on devising efﬁcient algorithms to com-

pute smallest (cardinality) contrastive explanations.

7 CONCLUSIONS

This paper presents simple arguments that reveal the

shortcomings of deciding robustness, be it global or

local. Similarly, the paper uncovers related pitfalls of

attempts to certifying robustness, namely when inputs

are unconstrained. In addition, the paper also argues

that possible attempts at solving the identiﬁed short-

comings are not entirely satisfactory.

In contrast to the negative results presented in the

paper, the paper also details recently proposed uses

of robustness tools, building on the connections be-

tween adversarial examples and explainability. Fur-

thermore, the negative results on robustness are used

to shed light on the properties of distance-restricted

explanations of ML models. Future work will fur-

ther investigate the links between adversarial exam-

ples and symbolic explanations, e.g. smallest size

contrastive explanations will be investigated.

ACKNOWLEDGEMENTS

This work was supported in part by the National

Research Foundation, Prime Minister’s Ofﬁce, Sin-

gapore under its Campus for Research Excellence

and Technological Enterprise (CREATE) programme,

by the Spanish government under grant PID2023-

152814OB-100, and by ICREA starting funds.

REFERENCES

Baharlouei, S., Sheikholeslami, F., Razaviyayn, M., and

Kolter, Z. (2023). Improving adversarial robustness

via joint classiﬁcation and multiple explicit detection

classes. In AISTATS, pages 11059–11078.

Barrett, C. W., Conway, C. L., Deters, M., Hadarean, L.,

Jovanovic, D., King, T., Reynolds, A., and Tinelli, C.

(2011). CVC4. In CAV, pages 171–177.

Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N.,

Laskov, P., Giacinto, G., and Roli, F. (2013). Eva-

sion attacks against machine learning at test time. In

ECML, pages 387–402.

Brix, C., Bak, S., Liu, C., and Johnson, T. T. (2023a). The

fourth international veriﬁcation of neural networks

competition (VNN-COMP 2023): Summary and re-

sults. CoRR, abs/2312.16760.

Brix, C., M

uller, M. N., Bak, S., Johnson, T. T., and Liu, C.

(2023b). First three years of the international veriﬁca-

tion of neural networks competition (VNN-COMP).

Int. J. Softw. Tools Technol. Transf., 25(3):329–339.

Brix, C., M

uller, M. N., Bak, S., Johnson, T. T., and Liu, C.

(2023c). First three years of the international veriﬁca-

tion of neural networks competition (VNN-COMP).

CoRR, abs/2301.05815.

Carlini, N., Tram

er, F., Dvijotham, K. D., Rice, L., Sun,

M., and Kolter, J. Z. (2023). (certiﬁed!!) adversarial

robustness for free! In ICLR.

Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A.,

and Mukhopadhyay, D. (2021). A survey on adversar-

ial attacks and defences. CAAI Transactions on Intel-

ligence Technology, 6(1):25–45.

Chen, K., Zhu, H., Yan, L., and Wang, J. (2020). A survey

on adversarial examples in deep learning. Journal on

Big Data, 2(2):71.

Chen, Y., Wang, S., Qin, Y., Liao, X., Jana, S., and Wag-

ner, D. A. (2021). Learning security classiﬁers with

veriﬁed global robustness properties. In CCS, pages

477–494.

Cohen, J., Rosenfeld, E., and Kolter, J. Z. (2019). Certiﬁed

adversarial robustness via randomized smoothing. In

ICML, pages 1310–1320.

Darwiche, A. (2023). Logic for explainable AI. In LICS,

pages 1–11.

Dimitrov, D. I., Singh, G., Gehr, T., and Vechev, M. T.

(2022). Provably robust adversarial examples. In

ICLR.

Dvijotham, K. D., Hayes, J., Balle, B., Kolter, J. Z., Qin, C.,

orgy, A., Xiao, K., Gowal, S., and Kohli, P. (2020).

A framework for robustness certiﬁcation of smoothed

classiﬁers using F-divergences. In ICLR.

Fu, F., Wang, Z., Fan, J., Wang, Y., Huang, C., Chen, X.,

Zhu, Q., and Li, W. (2022). REGLO: Provable neu-

ral network repair for global robustness properties. In

Workshop on Trustworthy and Socially Responsible

Machine Learning, NeurIPS.

Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P.,

Chaudhuri, S., and Vechev, M. T. (2018). AI2: safety

and robustness certiﬁcation of neural networks with

abstract interpretation. In IEEE S&P, pages 3–18.

Goodfellow, I. J., Shlens, J., and Szegedy, C. (2015). Ex-

plaining and harnessing adversarial examples. In

ICLR.

The Pros and Cons of Adversarial Robustness

395

Gopinath, D., Katz, G., Pasareanu, C. S., and Barrett, C. W.

(2018). DeepSafe: A data-driven approach for assess-

ing robustness of neural networks. In ATVA, pages

3–19.

Gorji, N. and Rubin, S. (2022). Sufﬁcient reasons for clas-

siﬁer decisions in the presence of domain constraints.

In AAAI, pages 5660–5667.

Han, S., Lin, C., Shen, C., Wang, Q., and Guan, X. (2023).

Interpreting adversarial examples in deep learning: A

review. ACM Computing Surveys.

Hein, M. and Andriushchenko, M. (2017). Formal guaran-

tees on the robustness of a classiﬁer against adversar-

ial manipulation. In NeurIPS, pages 2266–2276.

Hendrycks, D., Basart, S., Mu, N., Kadavath, S., Wang, F.,

Dorundo, E., Desai, R., Zhu, T., Parajuli, S., Guo, M.,

Song, D., Steinhardt, J., and Gilmer, J. (2021). The

many faces of robustness: A critical analysis of out-

of-distribution generalization. In ICCV, pages 8320–

8329.

Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krish-

nan, R., and Song, D. (2020). Pretrained transformers

improve out-of-distribution robustness. In ACL, pages

2744–2751.

Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y.,

Thamo, E., Wu, M., and Yi, X. (2020). A survey of

safety and trustworthiness of deep neural networks:

Veriﬁcation, testing, adversarial attack and defence,

and interpretability. Comput. Sci. Rev., 37:100270.

Huang, X. and Marques-Silva, J. (2023). From ro-

bustness to explainability and back again. CoRR,

abs/2306.03048.

Huang, X. and Marques-Silva, J. (2024). On the failings

of shapley values for explainability. Int. J. Approx.

Reason., page 109112.

Huang, Y., Zhang, H., Shi, Y., Kolter, J. Z., and Anand-

kumar, A. (2021). Training certiﬁably robust neu-

ral networks with efﬁcient local lipschitz bounds. In

NeurIPS, pages 22745–22757.

Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., and

Bengio, Y. (2016). Binarized neural networks. In

NeurIPS, pages 4107–4115.

Ignatiev, A. (2020). Towards trustable explainable AI. In

IJCAI, pages 5154–5158.

Ignatiev, A., Narodytska, N., Asher, N., and Marques-Silva,

J. (2020). From contrastive to abductive explanations

and back again. In AIxIA, pages 335–355.

Ignatiev, A., Narodytska, N., and Marques-Silva, J. (2019).

Abduction-based explanations for machine learning

models. In AAAI, pages 1511–1519.

Izza, Y., Huang, X., Morgado, A., Planes, J., Ignatiev, A.,

and Marques-Silva, J. (2024). Distance-Restricted

Explanations: Theoretical Underpinnings & Efﬁcient

Implementation. In KR, pages 475–486.

Izza, Y., Ignatiev, A., and Marques-Silva, J. (2022). On

tackling explanation redundancy in decision trees. J.

Artif. Intell. Res., 75:261–321.

Izza, Y. and Marques-Silva, J. (2023). The pros and cons of

adversarial robustness. CoRR, abs/2312.10911.

Izza, Y. and Marques-Silva, J. (2024). Efﬁcient contrastive

explanations on demand. CoRR.

Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochen-

derfer, M. J. (2017). Reluplex: An efﬁcient SMT

solver for verifying deep neural networks. In CAV,

pages 97–117.

Katz, G., Huang, D. A., Ibeling, D., Julian, K., Lazarus, C.,

Lim, R., Shah, P., Thakoor, S., Wu, H., Zeljic, A., Dill,

D. L., Kochenderfer, M. J., and Barrett, C. W. (2019).

The marabou framework for veriﬁcation and analysis

of deep neural networks. In CAV, pages 443–452.

ecuyer, M., Atlidakis, V., Geambasu, R., Hsu, D., and

Jana, S. (2019). Certiﬁed robustness to adversarial ex-

amples with differential privacy. In IEEE S&P, pages

656–672.

Leino, K., Wang, Z., and Fredrikson, M. (2021). Globally-

robust neural networks. In ICML, pages 6212–6222.

Letoffe, O., Huang, X., and Marques-Silva, J. (2024). On

correcting SHAP scores. In AAAI.

Liang, H., He, E., Zhao, Y., Jia, Z., and Li, H. (2022). Ad-

versarial attack and defense: A survey. Electronics,

11(8):1283.

Liu, X., Han, X., Zhang, N., and Liu, Q. (2020). Certiﬁed

monotonic neural networks. In NeurIPS.

Marques-Silva, J. (2023). Disproving XAI myths with for-

mal methods - initial results. In ICECCS, pages 12–

21.

Marques-Silva, J. (2024). Logic-based explainability: Past,

present and future. In ISoLA, pages 181–204.

Marques-Silva, J. and Huang, X. (2024). Explainability is

not a game. Commun. ACM, pages 66–75.

Marques-Silva, J. and Ignatiev, A. (2022). Delivering trust-

worthy AI through formal XAI. In AAAI, pages

12342–12350.

Marques-Silva, J. and Ignatiev, A. (2023). No silver bullet:

interpretable ML models must be explained. Frontiers

in Artiﬁcial Intelligence, 6:1128212.

Miller, T. (2019). Explanation in artiﬁcial intelligence: In-

sights from the social sciences. Artif. Intell., 267:1–

38.

uller, M. N., Brix, C., Bak, S., Liu, C., and Johnson, T. T.

(2022). The third international veriﬁcation of neural

networks competition (VNN-COMP 2022): Summary

and results. CoRR, abs/2212.10376.

Narodytska, N. (2018). Formal analysis of deep binarized

neural networks. In Lang, J., editor, Proceedings of

the Twenty-Seventh International Joint Conference on

Artiﬁcial Intelligence, IJCAI 2018, July 13-19, 2018,

Stockholm, Sweden, pages 5692–5696. ijcai.org.

Pedregosa, F. et al. (2011). Scikit-learn: Machine learning

in Python. Journal of Machine Learning Research,

12:2825–2830.

Reiter, R. (1987). A theory of diagnosis from ﬁrst princi-

ples. Artif. Intell., 32(1):57–95.

Rosenberg, I., Shabtai, A., Elovici, Y., and Rokach, L.

(2022). Adversarial machine learning attacks and de-

fense methods in the cyber security domain. ACM

Comput. Surv., 54(5):108:1–108:36.

Rosenfeld, E., Winston, E., Ravikumar, P., and Kolter, J. Z.

(2020). Certiﬁed robustness to label-ﬂipping attacks

via randomized smoothing. In ICML, pages 8230–

8241.

ICAART 2025 - 17th International Conference on Agents and Artiﬁcial Intelligence

396

Ruan, W., Wu, M., Sun, Y., Huang, X., Kroening, D., and

Kwiatkowska, M. (2019). Global robustness evalu-

ation of deep neural networks with provable guaran-

tees for the hamming distance. In IJCAI, pages 5944–

5952.

Rudin, C. (2019). Stop explaining black box machine learn-

ing models for high stakes decisions and use inter-

pretable models instead. Nature Machine Intelligence,

1(5):206–215.

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L.,

and Zhong, C. (2022). Interpretable machine learn-

ing: Fundamental principles and 10 grand challenges.

Statistics Surveys, 16:1–85.

Seshia, S. A., Desai, A., Dreossi, T., Fremont, D. J., Ghosh,

S., Kim, E., Shivakumar, S., Vazquez-Chanlatte, M.,

and Yue, X. (2018). Formal speciﬁcation for deep neu-

ral networks. In ATVA, pages 20–34.

Seshia, S. A., Sadigh, D., and Sastry, S. S. (2022). To-

ward veriﬁed artiﬁcial intelligence. Commun. ACM,

65(7):46–55.

Shih, A., Choi, A., and Darwiche, A. (2018). A symbolic

approach to explaining bayesian network classiﬁers.

In IJCAI, pages 5103–5111.

Singh, G., Gehr, T., Mirman, M., P

uschel, M., and Vechev,

M. T. (2018). Fast and effective robustness certiﬁca-

tion. In NeurIPS, pages 10825–10836.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,

D., Goodfellow, I. J., and Fergus, R. (2014). Intriguing

properties of neural networks. In ICLR.

Vor

acek, V. and Hein, M. (2023). Improving l1-certiﬁed

robustness via randomized smoothing by leveraging

box constraints. In ICML, pages 35198–35222.

Wang, Z., Huang, C., and Zhu, Q. (2022a). Efﬁcient

global robustness certiﬁcation of neural networks via

interleaving twin-network encoding. In DATE, pages

1087–1092.

Wang, Z., Wang, Y., Fu, F., Jiao, R., Huang, C., Li, W.,

and Zhu, Q. (2022b). A tool for neural network

global robustness certiﬁcation and training. CoRR,

abs/2208.07289.

Weng, L., Chen, P., Nguyen, L. M., Squillante, M. S.,

Boopathy, A., Oseledets, I. V., and Daniel, L. (2019).

PROVEN: verifying robustness of neural networks

with a probabilistic approach. In ICML, pages 6727–

6736.

Wiyatno, R. R., Xu, A., Dia, O., and de Berker, A. (2019).

Adversarial examples in modern machine learning: A

review. CoRR, abs/1911.05268.

Wu, M., Wu, H., and Barrett, C. W. (2023). Verix: To-

wards veriﬁed explainability of deep neural networks.

In NeurIPS.

Yan, G., Romano, Y., and Weng, T. (2024). Provably ro-

bust conformal prediction with improved efﬁciency.

In ICLR.

Yin, D., Lopes, R. G., Shlens, J., Cubuk, E. D., and Gilmer,

J. (2019). A fourier perspective on model robustness

in computer vision. In NeurIPS, pages 13255–13265.

Yu, J., Ignatiev, A., Stuckey, P. J., Narodytska, N., and

Marques-Silva, J. (2023). Eliminating the impossible,

whatever remains must be true: On extracting and ap-

plying background knowledge in the context of formal

explanations. In AAAI, pages 4123–4131.

Zhang, J. and Li, C. (2020). Adversarial examples: Oppor-

tunities and challenges. IEEE Trans. Neural Networks

Learn. Syst., 31(7):2578–2593.

Zhang, M., Levine, S., and Finn, C. (2022a). MEMO: test

time robustness via adaptation and augmentation. In

NeurIPS.

Zhang, X., Zheng, X., and Mao, W. (2022b). Adversarial

perturbation defense on deep neural networks. ACM

Comput. Surv., 54(8):159:1–159:36.

Zhou, S., Liu, C., Ye, D., Zhu, T., Zhou, W., and Yu,

P. S. (2023). Adversarial attacks and defenses in deep

learning: From a perspective of cybersecurity. ACM

Comput. Surv., 55(8):163:1–163:39.

The Pros and Cons of Adversarial Robustness

397