presents two primary advantages that we need to con-
sider in this phase.
When two systems have the same volume they can
be rather different but may be reasonably similar in
terms of the effort we need to devise them. The com-
putation of rules is expensive both in terms of required
design effort, and in terms of the computational effort
to build the rules. Both effort of design and compu-
tational cost depend on the length of the rules and on
the number of rules. We could have considered length
of the rule system as the sum of lengths of the rules.
However, these would have made equivalent two sys-
tems in a way that is in fact independent of the number
of rules. This is actually different from what we ex-
pect to be a good explanation. In fact, an explanation
should be understandable, and therefore we expect it
to be made of simple rules and that these rules are
not too many. Obviously we should get a tradeoff,
for the ideal explanation would result, from the above
premises, the empty rule system. Therefore, volume
is a better measure of length, for it could accommo-
date, in the same volume, or roughly the same, a num-
ber of possible different explanations that are similar
in terms of design effort, and let us able to choose the
best ones, by considering the most accurate or precise
or recalling.
To summarise, the idea is that when two approx-
imations are comparable in terms of simplicity we
should choose the most accurate (precise, recalling),
and on the opposite, when two approximations are
comparable in terms of accuracy (respectively preci-
sion, recall) we should choose the simpler one.
We can shift our attention to the details of the ap-
proach, by devising a method to compute approxima-
tions for a black box classifier. Consider a classifier
C that we cannot look more in detail than by its be-
haviour, namely by looking at the results of the clas-
sification on an execution test, without knowing how
the results are obtained. The idea of the method is
to perform the classification on a subset of the exe-
cution set, that we can control. On the input/output
behaviour of the classifier we can build a training set,
that is used to extract a classification rule set. We are
now able to build a confusion matrix and use it to de-
vise the correct understanding of the behaviour of the
classifier.
Specifically, when such a method is computed, we
can state that a binary classifier C has been approxi-
mated by a classification rule system S with accuracy
α, precision π and recall ρ (by using the measures ob-
tained by the confusion matrix) with volume V . Ide-
ally, we should also derive the idea that a rule set
is a better approximation of a given classifier when
it has the same accuracy (respectively precision, re-
call) but a smaller volume, or the same volume but a
better accuracy (respectively precision, recall). Obvi-
ously, the idea that accuracy, precision and recall re-
main the same (either alone or in triple) is unrealistic,
for these values are intrinsically unstable on classifi-
cation rule setup. We can establish a range interval on
which accuracy, precision and recall are to be consid-
ered equivalent, that is the confidence on that opera-
tion. The confidence is thus a value χ such that when
accuracy (respectively, precision, recall) of a set S
1
is to be considered roughly the same of accuracy (re-
spectively, precision, recall) of a set S
2
is that because
|α(S
1
) − α(S
2
)|≤ χ (and correspondingly for π and
ρ).
Therefore, we can state that a rule set S
1
is a better
approximation of a rule set S
2
because S
1
has less vol-
ume than S
2
and they are ordered by better accuracy
(respectively, precision, recall) or at most roughly the
same.
3 ALGORITHMS FOR
EXPLAINING THE BLACK BOX
Given the analysis discussed above, we can devise a
method to compute optimal rule systems that approx-
imates a classifier, or, on more practical design of the
method, a rule system that approximates a classifier
with an acceptable accuracy (respectively precision,
recall). Consider an approximation S
1
. We can make
two kinds of operations. The former consists in look-
ing for admissible simplifications that preserve accu-
racy (respectively precision, recall) and reduce vol-
ume. The latter, on the opposite, looks for improve-
ment on the accuracy (respectively precision, recall)
but preserve volume.
At this point of this discussion, we need to specify
an important aspect of the concept of rule system we
have devised so far. The idea of disjunctive systems is
quite simple, and it is so for we can provide room for
constraints on the range by means of a method that
results polynomial on deterministic machines. The
majority of methods like Apriori as well as methods
for similar approaches to other rule system, as in, for
instance, decision trees, provide incremental compu-
tation of the rule system. Thus, a rule system is more
complex and potentially more accurate, precise and
recalling, then the rule system computed on the previ-
ous step by those methods. To obtain good explana-
tions we therefore should look at these methods with
a critical eye: the purpose of the computation is to
choose among the partial solutions obtained by these
methods that we consider good explanations, more
than good classifiers. In fact, we do not really have
ICAART 2022 - 14th International Conference on Agents and Artificial Intelligence
900