A Probabilistic Theory of Abductive Reasoning

Nicolas A. Espinosa Dice

1 a

, Megan L. Kaye

1 b

, Hana Ahmed

1,2 c

and George D. Monta

nez

1 d

AMISTAD Lab, Department of Computer Science, Harvey Mudd College, Claremont, CA, U.S.A.

Scripps College, Claremont, CA, U.S.A.

Keywords:

Abductive Logic, Machine Learning, Creative Abduction, Creativity, Graphical Model, Bayesian Network,

Probabilistic Abduction.

Abstract:

We present an abductive search strategy that integrates creative abduction and probabilistic reasoning to pro-

duce plausible explanations for unexplained observations. Using a graphical model representation of abduc-

tive search, we introduce a heuristic approach to hypothesis generation, comparison, and selection. To identify

creative and plausible explanations, we propose 1) applying novel structural similarity metrics to a search for

simple explanations, and 2) optimizing for the probability of a hypothesis’ occurrence given known observa-

tions.

1 INTRODUCTION

Imagine that one morning you step outside to ﬁnd that

the grass is wet, ruining your new shoes. Could rain

have caused the wet grass? However, you cannot re-

call whether yesterday was cloudy. How likely is it to

have rained last night if there were no clouds?

Now imagine an alternative scenario, in which you

are a medical student studying the causes and symp-

toms of tuberculosis. You learn that if a patient has

an abnormal x-ray, there are several possible factors,

including lung cancer and tuberculosis. How can you

determine which diagnosis to give in light of the x-

ray results? What relevant information is available to

help you decide a best explanation?

For a ﬁnal example, imagine arriving at work to

ﬁnd that information on your company’s database has

been corrupted. Your boss is responsible for ﬁxing the

deﬁciency that allowed this data corruption to occur.

Overwhelmed by the vast number of possible expla-

nations for the data corruption, your boss tasks you,

a database engineer, with identifying plausible causes

of the issue.

These tasks require abductive inference: creating

and identifying hypotheses (causes) that are the most

https://orcid.org/0000-0001-7802-6196

https://orcid.org/0000-0001-5422-8244

https://orcid.org/0000-0003-4532-0334

https://orcid.org/0000-0002-1333-4611

promising explanations for the observed effects. You

can make use of background information, prior oc-

currences similar in nature to this one, where the ef-

fects and causes were successfully identiﬁed. How-

ever, you also acknowledge the possibility of unfa-

miliar causes, which are beyond the scope of prior

information and your current knowledge.

For example, in our database problem, corrupted

data is the unexplained observed effect. Your primary

task reﬂects that of abductive inference, which is a

strategy for discovering hypotheses that are worthy of

further investigation, which Schurz refers to as the

strategical function of abductive inference (Schurz,

2008). What constitutes further investigation de-

pends on the application of abductive inference; it

can be broadly deﬁned as any work that provides

more information about the causes or effects. Addi-

tionally, the search space—the space of all available

hypotheses—can be signiﬁcantly large, necessitating

an effective search strategy for ﬁnding promising hy-

potheses within reasonable time and computational

costs.

Despite signiﬁcant advances in machine learning

research over the last three decades, traditional super-

vised learning models are ill-equipped to handle the

aforementioned example problems. Supervised learn-

ing models emulate inductive inference, in which hy-

potheses are causal rules that best ﬁt the known data.

Unlike abduction, the primary function of inductive

inference is justiﬁcational, speciﬁcally the justiﬁca-

562

Espinosa Dice, N., Kaye, M., Ahmed, H. and Montañez, G.

A Probabilistic Theory of Abductive Reasoning.

DOI: 10.5220/0010195405620571

In Proceedings of the 13th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2021) - Volume 2, pages 562-571

ISBN: 978-989-758-484-8

tion of the conjectured conclusion. Induction serves

little strategical function because the range of possi-

ble conclusions is restricted by the methods of gener-

alizing prior observed cases.

In this paper, we present a novel abductive search

method capable of handling the examples previously

described. Our model ﬁrst uses abductive search for

hypothesis identiﬁcation. To limit redundancy in the

abductive search results, we introduce two distinct

similarity metrics that compare causal structures of

variables. Additionally, to account for possible un-

familiar causes, we implement hypothesis generation

in our model as a method of generating novel ex-

planations—hypothesized causes that are not neces-

sarily observed in the background information. Fi-

nally, while abductive “conﬁrmation” does not in-

dicate whether an abduced hypothesis logically pre-

cedes the observed effects, our model utilizes a hy-

pothesis comparison method to compare hypotheses

based on the likelihood of the explanation.

Cox et al. used abduction with surface deduction

to generate novel hypotheses from Horn-clauses, and

suggested extending this method’s application to ab-

duction from directed graphs (Cox et al., 1992). Our

abductive search model relies on Reichenbach’s Com-

mon Cause Principle rather than surface deduction

for hypothesis generation, and uses edit-distance and

Jaccard-based reasoning to distinguish redundant hy-

potheses. This combination of creative and proba-

bilistic abduction with similarity-based reasoning for

abductive search is distinct from the approach of Cox

et al. (Cox et al., 1992). The use of Reichenbach’s

Common Cause Principle is inspired by Schurz’s

theory on common cause abduction (Schurz, 2008).

While Schurz seems to deny the potential usefulness

of integrating common cause and Bayesian reasoning,

we introduce a form of Bayesian conﬁrmation that

provides probabilistic explanations for the hypotheses

discovered through common cause abduction. Like-

wise, while our abductive model checks consistency

and simplicity similarly to Reiter’s heuristic diagno-

sis model (Reiter, 1987), we rely on Bayesian condi-

tioning during hypothesis generation and comparison,

which strengthens the plausibility of our model’s con-

jectured hypotheses.

2 GRAPHICALLY MODELING

ABDUCTION

Graphical models are tools for integrating logical and

probabilistic reasoning in order to represent rational

processes and causal relationships. Developed by

Pearl, they comprehensively account for complexity

and uncertainty within a dataset (Pearl, 1998). A

probabilistic graphical model is composed of nodes

representing random variables, and edges connecting

the nodes to indicate conditional independence or de-

pendence.

An abductive search problem can be represented

in a directed acyclic graph (DAG), in which a di-

rected edge from one node (the “parent”) to another

(the “child”) represents a causal relationship between

them. For edges of a DAG that are weighted with the

conditional probability P(child | parent) of the child

variable given the parent, the weight speaks to the

causal relationship’s inﬂuential strength.

2.1 Bayesian Networks

We adapt the deﬁnition of Bayesian network from

(Feldbacher-Escamilla and Gebharter, 2019), and

make use of conventional notation: Sets of objects, in-

cluding sets of sets, are represented by boldfaced up-

percase letters (e.g., S). Variables are represented by

upper-case letters (e.g., X ), and their respective real-

izations are represented by corresponding lower-case

letters (e.g., x). Additionally, a directed edge between

two variables is represented by an arrow, →, where

the parent node is at the arrow’s tail and the child node

is at the tip (e.g., X

→ X

Following the deﬁnitions in (Feldbacher-

Escamilla and Gebharter, 2019), BhV

V ,E

E,Pi is a

Bayesian network such that V

V is a set of random

variables, E

E is a set of directed edges, and P is a

probability distribution over V

V .

For all X

∈ V

V , P

r(X

) is the set of X

’s parents:

r(X

) = {X

∈ V

V | X

→ X

}. (1)

The set of X

’s children is deﬁned as

h(X

) = {X

∈ V

V | X

→ X

}. (2)

We deﬁne the set of X

’s descendants to be

s(X

) = {X

∈ V

V | X

→ ... → X

}, (3)

and the set of X

’s ancestors to be

c(X

) = {X

∈ V

V | X

→ ... → X

}. (4)

Within the context of this paper, all variables in V

are discrete. To properly incorporate continuous vari-

ables into the model, the discretization approach pre-

sented in (Chen et al., 2017) can be used with a dis-

cretization runtime of O(r ·n

), where r is the number

of class variable instantiations. Furthermore, Freid-

man et al. present a method of discretizing continuous

variables while learning the structure of the Bayesian

network using background information, that is, data

denoting the values of previous instantiations of vari-

ables in V

V (Friedman et al., 1996).

A Probabilistic Theory of Abductive Reasoning

563

Additionally, we deﬁne a set of observed nodes

O = O

∪ O

, (5)

where

= {O

,...,O

} (6)

is the set of nodes representing observed effects that

require explanation. Subsequently,

= {O

,...,O

} (7)

is the set of nodes representing observations that do

not require explanations. Sets O

and O

are disjoint,

namely, O

∩ O

0. Furthermore, U

U is the set of

unobserved nodes, such that

U = V

V − O

O. (8)

A hypothesis H

H will take the form

H = {h

,...,h

}, (9)

where H

H ⊆ U

U. An explanation refers to hypotheses

that are causally related to a given set of observed ef-

fects.

2.1.1 Node Marginal Probability Distribution

We wish to calculate probabilities of n proposed

nodes v

∈ {v

,...,v

} given a set of known

nodes O

O = {x

,...,x

} and unknown nodes U

U =

,...,U

}. Because there are unknown nodes—

random variables with unknown values—we need to

account for all potential outcomes. Thus, we calculate

the marginal probability distribution:

P(v

,...,v

| x

,...,x

)

P(v

,...,v

,...,x

)

P(x

,...,x

)

(10)

To calculate the marginal probability distribution that

accounts for all potential outcomes of U

,...,U

, we

take a sum over all possible values. Thus, the numer-

ator of the full equation is:

P(v

,...,v

,...,x

) =

∑

∈{u

,¬u

}

...

∑

∈{u

,¬u

}

P(v

,...,v

,...,x

,...,U

) (11)

To complete the equation, the denominator follows

the same computation.

3 ABDUCTION AS A SEARCH

STRATEGY

We will demonstrate how to use abductive reasoning

in a best-ﬁrst search for explanations. Schurz deﬁnes

a best-ﬁrst explanation within the search space of an

abductive model as one that meets the following cri-

teria (Schurz, 2008):

1. The hypothesis is the most justiﬁable out of all

candidate hypotheses.

2. The children/successors are the most plausible of

all the hypothesis’ successors.

We expand on Schurz’s deﬁnition by adding the fol-

lowing third criterion:

3. The hypothesis is a common cause or distant com-

mon cause (an ancestor node) of all given ob-

served effects.

Addition of this third criterion for potential hypothe-

ses is based on Reichenbach’s Common Cause Prin-

ciple (CCP). The CCP is cited by Schurz as the jus-

tiﬁcation basis for creative abduction (Schurz, 2008),

and it is deﬁned as follows:

Deﬁnition 3.1 (Reichenbach’s Common Cause Prin-

ciple). For two properties A and B that are 1) corre-

lated, and 2) unrelated by a conditional relationship,

there must exist some common cause C such that A

and B are both causal effects of C.

Our method relies on this principle during hypoth-

esis selection. Given observed effects, we target a

common cause (a hypothesis) that is the most promis-

ing explanation of the observed effects.

3.1 Problem Deﬁnition

Using abduction as a search strategy, we model a cre-

ative abductive solution for the following search prob-

lem adapted from (Feldbacher-Escamilla and Gebhar-

ter, 2019).

Given:

• A set of observed effects O

• A set of known or background data O

Find:

A solution with the following elements (Prendinger

and Ishizuka, 2005):

• A candidate hypothesis H

that is causally related

to all O

∈ O

and all O

∈ O

• A causal rule denoting that H

is a potential set

of causes for O

• The necessary condition that H

∩O

O is consistent

for all O

∈ O

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

564

4 HYPOTHESIS

IDENTIFICATION,

GENERATION, AND

COMPARISON

Due to the vast search space of possible causes, the

model’s ﬁrst component, hypothesis identiﬁcation,

must be completed using an abductive search strategy.

Hypothesis identiﬁcation serves a strategical function

in the model by identifying possible common causes

of the observed effects. Treating possible causes as

hypotheses, we use the term abductive search because

we optimize our search for P(O

| H

H), the probabil-

ity of the observed effects occurring given the hy-

pothesis. P(O

| H

H) measures the ﬁt of a hypoth-

esis, or the degree to which hypothesis H

H explains

observed effects O

. Abductive inference, by deﬁni-

tion, is agnostic towards how probable a hypothesis

is, and instead optimizes for how well they explain

the observed effects. From a probabilistic reasoning

perspective, this is akin to optimizing for P(O

| H

H).

Thus, by optimizing our search for how well hypothe-

ses explain the observed effects, we are modeling

abduction. Additionally, optimizing the search for

P(O

| H

H) allows for the consideration of surprising

(unlikely) hypotheses that if true, sufﬁciently explain

the observed effects.

Having identiﬁed the set of possible common

causes, each cause is treated as a candidate hypoth-

esis and evaluated by a comparison function that opti-

mizes for P(H

H | O

), the probability that the hypoth-

esis is true given the observed effects occurring. By

Bayes’ Theorem, we see

P(H

H | O

) ∝ P(O

| H

H)P(H

H). (12)

Thus, by optimizing for P(H

H | O

), the probability

of the hypothesis is taken into account. Through hy-

pothesis comparison, our model incorporates a justiﬁ-

cational function. The best performing candidate hy-

potheses are added to the set of promising hypotheses,

denoted H

4.1 Hypothesis Identiﬁcation and

Abductive Search for Possible

Common Causes

This section discusses the search for possible com-

mon causes of a set of observed effects. For clarity,

in probability functions, we treat possible common

causes as possible hypotheses.

The abductive search attempts to model the fol-

lowing

argmax

H⊆U

P(O

| H

H), (13)

in cases where O

is given. Because edges in

a Bayesian network are weighted with conditional

probabilities, the hypotheses that optimize P(O

| H

will generally be connected to more of the observed

effects by an edge or directed path. So, rather than

computing P(O

| H

H) for all of the possible hypothe-

ses, we can instead search for possible hypotheses that

maximize the number of observed effects to which

they are connected.

To begin the search for such possible hypothe-

ses, we can apply CCP to the observed effects if and

only if all variables O

∈ O

satisfy the conditions

of CCP. Speciﬁcally, for all O

∈ O

such that

6= O

, O

and O

must be correlated and unre-

lated by a conditional relationship. For now, we as-

sume that O

satisﬁes the criteria for CCP, and we

will later demonstrate how to handle the two possible

cases in which CCP criteria are not satisﬁed.

By CCP, there exists some common cause C

such that O

and O

are both effects of C, for all

∈ O

where O

6= O

. This means there

exists some directed path from C to O

and O

Deﬁnition 4.1 (Directed Path). A directed path from

to X

, where X

∈ V

V , is a set of edges E

such that either (X

),...,(X

) ∈ E

where

,...,α

represent arbitrary node indices of the

graph for some k ∈ N, or (X

) ∈ E

Therefore, the set of common causes of the ob-

served effects, C

C(O

), must be a subset of the set of

variables with a directed path to O

Deﬁnition 4.2 (Singleton Complete Explanations).

A singleton complete explanation is a variable in

c(O

) with a directed path to every variable in O

The set of singleton complete explanations is given by

) :=

∈O

c(O

). (14)

We refer to the singleton explanations in C

) as

possible common causes because P(O

| H

H)—where

H is the cause—has not yet been computed. Calcu-

lating this marginal probability is the only method of

verifying a variable or set of variables as an actual

common cause. Each possible cause in this set is con-

sidered a complete explanation because there exists a

directed path from the nodes composing the explana-

tion to each observed effect.

However, we must also consider cases where O

does not satisfy the CCP criteria. Speciﬁcally, there

are two possible cases in which the CCP criteria is

not satisﬁed by O

Case 1. Suppose that there exists some distinct pairs

of variables O

∈ O

such that O

and O

are

A Probabilistic Theory of Abductive Reasoning

565

conditionally related. In such a case, there must be

a directed path between O

and O

. Consequently,

since the Bayesian network is acyclic, then without

a loss of generality, O

∈ D

s(O

). Therefore, it

is possible that O

explains O

, meaning that O

causes O

. So, O

can be removed from O

and

added to O

. Thus, we would maintain the condition

that all pairs of distinct variables in O

are unrelated

by an edge or directed path. However, if we are not

certain that O

explains O

, then we can leave O

in O

Case 2. Suppose there exists some distinct pairs of

variables O

∈ O

such that O

and O

are

uncorrelated. In this case, because O

and O

are

observed effects, it may be impossible to ﬁnd a single

common cause explaining both nodes. Therefore, we

must consider cases where the best explanation is a

hypothesis containing multiple variables.

There may exist three distinct types of possible

common causes:

1. Multivariate subsets of A

c(O

) that are com-

plete explanations.

2. Multivariate subsets of A

c(O

) that are partial

explanations.

3. Multivariate subsets of A

c(O

) that are novel

explanations.

Note that the possible common causes are now multi-

variate sets rather than singleton sets.

When the CCP criteria is not satisﬁed, there may

not exist a single common cause of all the observed

effects. In such cases, we must instead consider ex-

planations that incorporate multiple variables.

Deﬁnition 4.3 (Multivariate Complete Explanations).

A multivariate complete explanation consists of mul-

tiple nodes whose joint set of descendants contains

the set of observed effects as a subset. The set of mul-

tivariate complete explanations is given by

S ⊆ A

c(O

) | O

⊆ D

s(S

S)}. (15)

We must also consider the existence of an observed

effect whose explanation is beyond the scope of the

model. This could occur when O

contains noisy ob-

served effects that cannot be sufﬁciently explained by

the model.

A simple example of noisy observed effects in

Bayesian networks are root nodes: nodes with empty

ancestor sets. Since the ancestor sets of root nodes

are empty, there cannot exist any causes or hypothe-

ses that explain the root nodes. In such cases, the root

nodes in O

given by

= {O

∈ O

| P

r(O

) =

0}, (16)

would be removed from O

and added to O

If a noisy observation is not a root node, it remains

in O

. To handle such cases, we include possible

causes that do not have a directed path from the pos-

sible cause to every observed effect.

Deﬁnition 4.4 (Multivariate Partial Explanations). A

partial explanation consists of multiple nodes whose

joint set of descendants contains a subset of O

. The

set of partial explanations is given by

S ⊆ A

c(O

) | ∃O

∈ O

∈ D

s(S

S)}. (17)

Lastly, we account for observed effects that de-

scend from unfamiliar causes: causes of a given set of

observed effects that were not observed in the back-

ground information. In these cases, we develop a hy-

pothesis generation method to generate novel expla-

nations of the unique causes.

Deﬁnition 4.5 (Novel Explanations). A novel expla-

nation is an explanation found through hypothesis

generation.

Hypothesis generation refers to the introduction of

new edges in the Bayesian network for the purpose of

creating common causes of the observed effects. Gen-

erating an edge between two nodes entails the devel-

opment of a causal relationship between them. The

set of generated edges E

of a hypothesis H

H will

take the form

:= {(h

),...,(h

)}. (18)

However, the model is faced with a vast search

space of nodes to generate new edges between, neces-

sitating incorporation of bias in the search. Specif-

ically, in searching for a set of unobserved nodes to

generate edges between, we optimize for the number

of observed effects that are descendants of the given

set of unobserved nodes.

4.1.1 Implementation

Because there is uncertainty as to whether O

satis-

ﬁes the CCP criteria, we must include multivariate

complete explanations, multivariate partial explana-

tions, and novel explanations in the set of possible

common causes.

Algorithm 1 identiﬁes the sets of singleton com-

plete explanations, multivariate complete explana-

tions, and multivariate partial explanations, and it re-

turns their union, deﬁned as C

)

. Algorithm 2

then uses C

)

to generate novel explanations,

where C

) is the set of novel explanations.

Algorithm 1 is motivated by the Apriori algo-

rithm (Agrawal and Srikant, 1994) for inferring causal

relations between sales items from large transaction

datasets. The Apriori algorithm relies on the apriori

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

566

property, a relational invariant between sets and sub-

sets, in order to improve efﬁciency. We leverage a

similar property in the following algorithm.

Algorithm 1: Computing Partial and Complete

Explanations, C

)

Set C

)

Set R

= A

c(O

);

Set R

Curr

for k = k

− 1,k

− 2,. . . , 1 do

if R

is empty then

Return C

)

;

end

for S

S ⊆ R

such that |S

S| = k and

sim(R

Curr

S) < S

∩D

s(S

S)|

≥ P

then

Set C

)

= C

)

∪ {S

S};

Set R

Curr

= R

Curr

∪ S

end

Set R

= R

Curr

;

Set R

Curr

end

Return C

)

;

Let k represent the size of the candidate possible hy-

potheses set and k

be a hyperparameter specifying

the maximum size k to be considered. The set R

Curr

keeps track of variables for which the algorithm will

compute smaller subsets. Whether a set of variables

is added to R

Curr

is determined by measuring the sim-

ilarity of R

Curr

to the new set. If the new set is above

a certain threshold, speciﬁed by hyperparameter S

then the new set does not substantially add to the ex-

isting connection between the nodes in R

Curr

and the

observed effects, nor does the new set signiﬁcantly in-

crease the ability of R

Curr

to explain the observed ef-

fects. In such a case, we ignore that set. Finally, P

a hyperparameter that speciﬁes the percentage of the

observed effects that must be connected to a possible

hypothesis.

Next, we use C

)

to compute novel expla-

nations. Speciﬁcally, Algorithm 2 computes N

N(O

a set of tuples that associates hypotheses with their

corresponding generated edges. More precisely, for

the members of N

N(O

), the ﬁrst element in each tu-

ple is a hypothesis and the second element in the tu-

ple is the hypothesis’ corresponding set of generated

edges, E

. Consequently, the set of novel explana-

tions, C

), is the set of the ﬁrst elements in the

tuples in N

N(O

). Additionally, note that C

)

), but each hypothesis in C

) contains a

corresponding set of generated edges that deﬁnes new

edges between nodes, resulting in a new explanation.

As a technical note, the implementation of Algorithm

2 only iterates over partial explanations in C

)

since complete explanations contain nodes that are al-

ready connected to all observed effects.

Algorithm 2: Computing Novel Explanations.

Set N

N(O

) =

for H

H ∈ C

)

Set O

Exc

= O

− (O

∩ D

s(H

H));

Set E

for O

∈ O

Exc

Set H

Max

= argmax

∈H

sim(O

);

Set E

= E

∪ {(H

Max

)};

end

Set N

N(O

) = N

N(O

) ∪ {(H

H, E

)};

end

Return N

N(O

);

4.2 Hypothesis Comparison

Once we have identiﬁed the set of potential com-

mon causes C

)

, we reﬁne it by optimizing for

P(H

H | O

O), the likelihood of a hypothesis, which serves

a justiﬁcational function in the model.

It is important to note that P(O

O | H

H) is not a ver-

iﬁcation measure of the hypothesis’ occurrence in a

given situation, but it is rather a justiﬁcational com-

ponent for the model’s output set C

)

. Each

H ∈ C

)

is theoretical and therefore unveriﬁ-

able by an abductive model, but we can estimate a

hypothesis’ promise given the observed facts with the

measure P(O

O | H

H).

4.2.1 Comparing Identiﬁed and Generated

Hypotheses

In order to compare selected and generated hypothe-

ses, we deﬁned a cost function that optimizes for

P(H

H | O

O), while creating a negative “cost” for gen-

erated edges.

The comparison function is

F(H

H, E

O) := P(H

H | O

O) − αc(E

), (19)

where c(E

) is the cost of generating edges, deﬁned

below, α is a weighting hyperparameter, and E

the set of generated edges of hypothesis H

H. The cost

of generating edges is deﬁned by

c(E

) :=

∑

)∈E

(1 − sim(X

)), (20)

where sim(X

) is the similarity of X

compared to

, according to the similarity metrics deﬁned in Sec-

A Probabilistic Theory of Abductive Reasoning

567

tion 5. Note that for each hypothesis under consid-

eration, the probability distribution for the particular

hypothesis is updated such that for all (O

) ∈ E

P(O

| H

) = 1.

For all candidate hypotheses H

H ∈ C

)

F(H

H, E

O) is computed, and the hypotheses with

the highest P

percent of scores are added to the set

of promising hypotheses, H

, where P

is a hyperpa-

rameter used to control the selectivity of the process.

5 SIMILARITY METRICS

In hypothesis comparison, we incorporate bias to-

wards simple explanations. Simplicity in the graph-

ical model is indicated by structural dissimilarity,

as similar hypotheses can be deemed redundant. To

gauge structural similarity between hypotheses, we

introduce two metrics that determine node similarity:

1) graph edit distance, on the basis of edge weight,

and 2) the Jaccard Index of Variables, on the basis

of common descendants. In the search for plausible

explanations within a probabilistic Bayesian network,

disfavoring similarity reduces redundancy among hy-

potheses and produces simpler explanations. Intu-

itively, this bias reﬂects an innate understanding that

objects with similar properties and behaviors produce

similar outcomes, and vice versa.

5.1 Similarity Metric: Jaccard Index of

Variables

The Jaccard Index is deﬁned as:

J(A,B) :=

|A ∩ B|

|A ∪ B|

, (21)

where A and B are sets.

The model uses what we deﬁne as the Jaccard In-

dex of Variables, a Jaccard Index-inspired metric of

similarity between variables in V

V . But rather than re-

lying solely on set cardinalities, we use the probabil-

ity distribution P. Thus, we adapt the Jaccard Index

to compute similarity between two variables A and B

in the Bayesian network based on their children. The

children of A and B are denoted as C

h(A) = A

and

h(B) = B

. The Jaccard Index of Variables is de-

ﬁned as

sim

Jaccard

(A,B) :=

h(A) ∩C

h(B))

h(A) ∪C

h(B))

, (22)

∩ B

)

∪ B

)

, (23)

where

∪ B

) := |A

− B

| + (A

) + (B

)

− (A

∩ B

), (24)

∩ B

) :=

∑

C∈A

∩B

min(P(C|A),P(C|B)) (25)

and

) :=

∑

C∈A

∩B

P(C | A), (26)

) :=

∑

C∈A

∩B

P(C | B). (27)

5.2 Similarity Metric: Edit-distance

We refer to edit-distance (Bunke, 1997) as the cost of

operations required to transform one graphical struc-

ture into another. Edit-distance is the basis of our edit-

distance based similarity metric, which is deﬁned as

sim

Edit

(A,B) :=

c(A,B)

h(A)|

(28)

c(A,B)

, (29)

where c(A,B) is a cost function to measure the cost of

graph changing operations such that

c(A,B) :=

∑

C∈A

∩B

kP(C | A)−P(C | B)k+|A

−B

(30)

where kP(C | A) − P(C | B)k is the absolute value of

the difference P(C | A) − P(C | B).

5.3 Similarity of Sets of Variables

In addition to computing the similarity of variables,

we can compute the similarity of sets of variables.

Speciﬁcally, for some similarity metric sim(·,·), we

can compute

sim(A

A,B

B) :=

∑

∈A

sim(X

∗

)

(31)

where X

∗

:= argmax

∈B

sim(X

). The algorithm

for computing the similarity of sets of variables is

given below in Algorithm 3.

6 APPLICATIONS

6.1 Database Corruption

Using the hypothesis identiﬁcation, generation, and

comparison methods previously described, we can

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

568

Algorithm 3: Computing Similarity of Sets of

Variables, sim(A

A,B

B).

Set total = 0;

for X

∈ A

A do

Set X

∗

= argmax

∈B

sim(X

∗

);

Set total = total + sim(X

∗

);

end

Set sim(A

A,B

B) = total/|A

A|;

Return sim(A

A,B

B);

demonstrate hypothesis selection within the example

of a corrupted database, which we discussed in Sec-

tion 1.

First, we will construct a graphical model to repre-

sent the situation. Figure 1 gives a hypothetical graph-

ical representation of the corrupted database example

to illustrate our process, with nodes that are related by

an arbitrary conditional probability table.

Figure 1: Data Corruption Example DAG.

After inputting background information and observed

effects/surprising phenomena to the current model,

we can identify every potential hypothesis—both

probable and improbable. For example, we could

have observed Hardware Malfunction in conjunction

with Data Corruption.

Some of these hypotheses could be overly-

complicated and redundant. For example, a possi-

ble explanation could be the simultaneous occurrence

of an Internal Program Error and a Virus/Malware.

Another possible explanation could that only a

Virus/Malware caused Data Corruption. These two

hypotheses share common components and effects,

but we can reduce the complexity of our hypothesis

by simply selecting Virus/Malware. In these cases of

redundant hypotheses, our model utilizes the similar-

ity metrics to identify the simple hypothesis.

Our ﬁrst step, which we refer to as observation-

testing, is calculating P(O

O | H

) for some hypothesis

. Given a potential hypothesis, we will calculate

the probabilities of the observed effects. However,

due to the graph structure, we need to calculate the

marginal probability, which takes into account all rel-

evant nodes, regardless of whether they are in our hy-

pothesis set or not. We will then choose the top n%

of these hypotheses to move on to the second phase.

Having found hypotheses that best explain the ob-

served effects, the second step is hypothesis reﬁne-

ment: calculating P(H

| O

O) for some hypothesis H

Given the set of hypotheses from observation-testing,

we will then identify and choose the hypothesis that is

most probable given our background information. As

a result of this step, we are given the most probable

hypothesis that also adequately explains our observed

effects.

Suppose that our ﬁrst step selected two potential

hypotheses to move forward into stage two: Cos-

mic Rays, which yielded a probability of .93, and

Virus/Malware, which yielded a probability of .71.

However, during phase two, suppose Virus/Malware

yielded a probability of .67, while Cosmic Rays

yielded a probability of .002. Virus/Malware would

have a higher ﬁnal probability and would therefore be

chosen as our ﬁnal hypothesis.

We continue the data corruption example to

demonstrate hypothesis generation. Consider a new

situation where we are given Internal Program Error

and Security Breach as potential explanations for the

observed occurrences of both Hardware Malfunction

and Read/Write Errors. Note that while Internal Pro-

gram Error is a parent node to Read/Write Errors, it

is not an ancestor for Hardware Malfunction. Also,

Security Breach is not an ancestor for Hardware Mal-

function nor Read/Write Errors. This means that we

have no common cause hypothesis for the observed

effects. In this case, the model will generate a com-

mon cause hypothesis using the edge generation pro-

cess described in Section 4.1.

An edge will be introduced to connect a hypothe-

sis node with a new child node that is also an observed

effect. In the current example, neither of our potential

hypotheses would cause a Hardware Malfunction, yet

a Hardware Malfunction has been observed. So, two

edges are generated: one connecting Security Breach

to Hardware Malfunction, and one connecting Inter-

nal Program Error to Hardware Malfunction. Security

Breach and Internal Program are now novel common

cause hypotheses, and will be reevaluated using the

observation-testing and hypothesis reﬁnement meth-

ods previously described.

6.2 The Wet Grass Network

As an additional example, let us consider the Wet

Grass Bayesian network (Bayes Server, 2020), shown

A Probabilistic Theory of Abductive Reasoning

569

Figure 2: Wet Grass Bayes Net.

in Figure 2. Using our similarity methods, the algo-

rithm successfully avoids choosing both “S” (Sprin-

kler) and “R” (Rain) to increase the probability when

the two are not given. However, when either “S” or

“R” is an observation, the algorithm chooses the cor-

responding node to increase the probability of grass

being wet. However, it appears to strongly weight hy-

potheses that perform well in the ﬁrst stage, especially

if the graph is small. As an example, when given

“W” (WetGrass) while observing “C” (Cloudy), the

algorithm chooses “S” as the cause, as it performs

better in the ﬁrst phase and there are only two hy-

potheses, “S” and “R”, and the algorithm chooses the

top half for these testing purposes. Overall, this sys-

tem chooses simple hypotheses with no extraneous

information that aim to increase probability of the

observed effects, while also taking into account the

probability of the hypotheses.

6.3 Discussion

While our algorithms produce reasonable explana-

tions for small graphs, it remains to be seen how well

these methods scale to exponentially larger networks,

and how they adapt to non-Bayesian probability the-

ories or graphical models. As such, we view this

work as a preliminary investigation into using prob-

abilistic structures to develop abductive explanations,

in comparison to the large body of previous work

in abductive computation which has focused primar-

ily on symbolic methods and formal logic (Ng and

Mooney, 1992; Mooney, 2000; Juba, 2016; Ignatiev

et al., 2019).

7 CONCLUSION

The purpose of our abductive search model is to de-

velop plausible explanations for surprising phenom-

ena. Approaching this scenario as a search problem,

we are interested in ﬁnding our search target, which

is a set of the most promising potential hypothetical

causes for the unexplained observed events. The ﬁt

of a hypothesis relative to known data is measured

by P(O

O | H

H), which is the probability of the observed

events occurring given that the hypothesis is also true.

So, we want a set of hypotheses C

= H

,...,H

that, for each H

∈ C

, 1) optimizes the measure of ﬁt

P(O

O | H

), and 2) has a high P(H

| O

O) value relative

to other hypotheses.

Having established criteria for a search target,

we apply an abductive search strategy to ﬁnd these

promising potential hypotheses. We deemed abduc-

tion as the most effective form of inference for ad-

dressing such problems, including the data corruption

one, where a) known information is incomplete and

b) a set of novel hypotheses are the search target.

We used graphical models—speciﬁcally, Bayesian

networks—to represent both the causal relationships

within a search space and the elements of abductive

search.

We present hypothesis selection, generation, and

comparison as the primary methods for ﬁnding

promising potential hypotheses. Our hypothesis se-

lection criteria is based upon Reichenbach’s Common

Cause Principle. In the case that no common cause

hypothesis exists, we rely on hypothesis generation

to produce novel potential common cause hypothe-

ses. These generated hypotheses can be a) multivari-

ate hypotheses, b) partial explanations, or c) gener-

ated edges. Then, having obtained the set of all po-

tential hypothetical causes, we subject the individual

elements to hypothesis comparison, selecting the hy-

pothesis that maximizes P(H

H | O

O).

Future research on probabilistic abduction’s ex-

planatory capabilities can be conducted with regards

to search algorithms in general, which are often

viewed as black-box methods. We also see Bayes’

Theorem as a tool for bridging theoretical abductive

search methods—such as those presented in this pa-

per, those in (Schurz, 2008), and those in (Cox et al.,

1992)—with machine learning classiﬁcation from in-

complete or fuzzy data.

Other future work includes analyzing algorithmic

scaling and implementing methods such as dynamic

programming to reduce algorithm runtime, as well

as further exploring effective hypothesis generation

methods by examining the effects of edge genera-

tion on hidden variables in a Bayesian network and

ICAART 2021 - 13th International Conference on Agents and Artiﬁcial Intelligence

570

the implications of successive edge generation on our

model’s predictive accuracy.

ACKNOWLEDGEMENTS

This research was supported in part by Harvey Mudd

College and the National Science Foundation under

Grant No. 1950885. Any opinions, ﬁndings or con-

clusions expressed are the authors’ alone, and do not

necessarily reﬂect the views of Harvey Mudd College

or the National Science Foundation.

REFERENCES

Agrawal, R. and Srikant, R. (1994). Fast Algorithms for

Mining Association Rules. Proceeding of the 20th

VLDB Conference, pages 487–499.

Bayes Server (2020). Live Examples. https://www.

bayesserver.com/. Accessed: 2020/09/25.

Bunke, H. (1997). On a relation between graph edit distance

and maximum common subgraph. Pattern Recogni-

tion Letters, 18(8):689–694.

Chen, Y.-C., Wheeler, T. A., and Kochenderfer, M. J.

(2017). Learning discrete Bayesian networks from

continuous data. Journal of Artiﬁcial Intelligence Re-

search, 59:103–132.

Cox, P., Knill, E., and Pietrzykowski, T. (1992). Abduction

in logic programming with equality. In Proceedings

of the Eighth International Conference on Fifth Gen-

eration Computer Systems.

Feldbacher-Escamilla, C. J. and Gebharter, A. (2019). Mod-

eling creative abduction Bayesian style. European

Journal for Philosophy of Science, 9(1):9.

Friedman, N., Goldszmidt, M., et al. (1996). Discretiz-

ing continuous attributes while learning Bayesian net-

works. In Proceeding of the Thirteenth International

Conference on Machine Learning, pages 157–165.

Ignatiev, A., Narodytska, N., and Marques-Silva, J. (2019).

Abduction-based explanations for machine learning

models. In Proceeding of the Thirty-Third AAAI Con-

ference on Artiﬁcial Intelligence, volume 33, pages

1511–1519.

Juba, B. (2016). Learning abductive reasoning using

random examples. In Proceedings of the Thirti-

eth AAAI Conference on Artiﬁcial Intelligence, pages

999–1007. Citeseer.

Mooney, R. J. (2000). Integrating abduction and induction

in machine learning. Abduction and Induction, pages

181–191.

Ng, H. T. and Mooney, R. J. (1992). A First-Order Horn-

Clause Abductive System and Its Use in Plan Recog-

nition and Diagnosis. Technical report, Department

of Computer Sciences, The University of Texas at

Austin.

Pearl, J. (1998). Graphical models for probabilistic and

causal reasoning. Handbook of Defeasible Reasoning

and Uncertainty Management Systems, 1:367–389.

Prendinger, H. and Ishizuka, M. (2005). A creative abduc-

tion approach to scientiﬁc and knowledge discovery.

Knowledge-Based Systems, 18(7):321–326.

Reiter, R. (1987). A theory of diagnosis from ﬁrst princi-

ples. Artiﬁcial intelligence, 32(1):57–95.

Schurz, G. (2008). Patterns of abduction. Synthese,

164(2):201–234.

A Probabilistic Theory of Abductive Reasoning

571