Fast Solving of Inﬂuence Diagrams for Multiagent Planning on

GPU-enabled Architectures

Fadel Adoe, Yingke Chen and Prashant Doshi

THINC Lab, Department of Computer Science, University of Georgia, Athens, Georgia, U.S.A.

Keywords:

GPU, Multiagent Systems, Planning, Speed Up.

Abstract:

Planning under uncertainty in multiagent settings is highly intractable because of history and plan space com-

plexities. Probabilistic graphical models exploit the structure of the problem domain to mitigate the com-

putational burden. In this paper, we introduce the ﬁrst parallelization of planning in multiagent settings on

a CPU-GPU heterogeneous system. In particular, we focus on the algorithm for exactly solving interactive

dynamic inﬂuence diagrams, which is a recognized graphical models for multiagent planning. Beyond paral-

lelizing the standard Bayesian inference, the computation of decisions’ expected utilities are parallelized. The

GPU-based approach provides signiﬁcant speedup on two benchmark problems.

1 INTRODUCTION

Planning under uncertainty in multiagent settings is

a very hard problem because it involves reasoning

about the actions and observations of multiple agents

simultaneously. In order to formally study this prob-

lem, the approach is to generalize single-agent plan-

ning frameworks such as the partially observable

Markov decision process (POMDP) (Smallwood and

Sondik, 1973) to multiagent settings. This has led

to the decentralized POMDP (Berstein et al., 2005)

for multiagent planning in cooperative settings and

the interactive POMDP (Gmytrasiewicz and Doshi,

2005) for individual planning in cooperative or non-

cooperative multiagent settings. A measure of the

involved computational complexity is available by

noting that the problem of solving a decentralized

POMDP exactly for a ﬁnite number of steps is NEXP

complete (Bernstein et al., 2002).

Some of the complexity of multiagent planning

may be mitigated by exploiting the structure in the

problem domain. Often, the state of the problem can

be factored into random variables and the conditional

independence between the variables may be naturally

exploited by representing the planning problem us-

ing probabilistic graphical models. An example of

such a model is the interactive dynamic inﬂuence di-

agram (I-DID) (Doshi et al., 2009) that generalizes

the well-known DID (Howard and Matheson, 1984),

which may be viewed as a graphical counterpart of

POMDP, to multiagents settings in the same way that

an interactivePOMDP generalizes the POMDP. In ad-

dition to modeling the problem structure, graphical

models provide an intuitive language for representing

the planning problem thereby serving as an important

tool to enable multiagent planning.

Emerging applications in automated vehicles that

communicate (Luo et al., 2011), integration with the

belief-desire-intention framework (Chen et al., 2013),

and for ad hoc teamwork (Chandrasekaran et al.,

2014) motivate improved solutions of I-DIDs. While

techniques exist for introducing further efﬁciency into

solving I-DIDs (Zeng and Doshi, 2012), we may

also explore parallelizing its solution algorithm on

new high-performance computing architectures such

as those utilizing graphic processing units (GPU). A

GPU consists of an array of streaming multiproces-

sors (SM) connected to a shared memory. Each SM

typically consists of a set of streaming processors.

Consequently, a GPU supplements the CPU by en-

abling massive parallelization of simple computations

that do not require excessive memory.

Our contribution in this paper is ways of paral-

lelizing multiple steps of the algorithm for exactly

solving I-DIDs on CPU-GPU architectures. This pro-

motes signiﬁcantly faster planning on benchmark and

large multiagent problems up to an order of magni-

tude in comparison to the run-time performance of the

A GUI-based software application called Netus is

freely available from http://tinyurl.com/mwrtlvg for design-

ing I-DIDs.

183

Adoe F., Chen Y. and Doshi P..

Fast Solving of Inﬂuence Diagrams for Multiagent Planning on GPU-enabled Architectures.

DOI: 10.5220/0005224001830195

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence (ICAART-2015), pages 183-195

ISBN: 978-989-758-074-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

existing algorithm. In addition to the usual chance,

decision and utility nodes, I-DIDs include a new type

of node called the model node and a new link called

the policy link between the model node and a chance

node that represents the distribution over the other

agent’s actions given its model.

The algorithm for solving an I-DID expands a

given two-time slice I-DID over multiple steps and

collapses the I-DID into a ﬂat DID. We may then

use the standard sum-max-sum rule and a general-

ized variable elimination algorithm for IDs (Koller

and Friedman, 2009) to compute the maximum ex-

pected utilities of actions at each decision node to

solve the I-DID. Multiple models in the model node

are recursively solved in an analogous manner. Our

approach is to parallelize two steps of this algorithm:

(i) The four operations involved in the sum-max-

sum rule: max-marginalization (of decisions), sum-

marginalization (of chance variables), factor-product

(of probabilities and utilities) and factor-addition (of

utilities) are parallelized on the GPU. (ii) Probabil-

ity factors in the variable elimination could be large

joints of the Bayesian network at each time slice, and

we parallelize the message passing performed on a

junction tree during the inference, on the GPU.

We evaluate the parallelized I-DID solution algo-

rithm on two benchmark planning domains, and show

more than an order of magnitude in speed up on some

of the problems compared to the previous algorithm.

We evaluate on planning domains that in size of the

state, action and observation spaces, and extend the

planning over longer horizons. In addition, we study

the properties of our algorithm by allocating it in-

creasing concurrency on the GPU and show that it’s

run time improves up to a point beyond which the

gains are lost.

The rest of the paper is organized as follows. Sec-

tion 2 provides preliminaries about the I-DID and

concepts of GPU-based programming. Section 3 re-

views related work. Section 4 proposes a GPU-based

approach to exactly solve the I-DID in parallel. Sec-

tion 6 theoretically analyze the speed up. Section 7

demonstrates the speed up by the proposed approach

on two problems. Section 8 concludes this paper.

2 BACKGROUND

In this section, we brieﬂy review the probabilistic

graphical model, DID, and its generalization to mul-

tiagent settings, I-DID. General principles behind

GPU-based programming are also brieﬂy described.

2.1 Dynamic Inﬂuence Diagram

A DID, D, is a directed acyclic graph over a set of

nodes: chance nodes C (ellipses), representing ran-

dom variables; decision nodes D (rectangles), model-

ing the action choices; utility nodes U (diamonds),

representing rewards based on chance and decision

node values, and a set of arcs representing depen-

dencies. Conditional probability distributions, P, and

utility functions, R, are associated with the chance

and utility nodes, respectively. In rest of the paper,

nodes and variables are used interchangeably.

The domain of a variable Q, denoted as dom(Q),

contains its possible values. The parent of Q, de-

noted as Pa

, is a set of variables having direct

arcs incident on Q. The domain of Pa

, dom(Pa

is the Cartesian product of the individual domains:

dom(Pa

) =

∏

Z∈Pa

dom(Z), and a value of this do-

main is denoted as, pa

. A probability factor, φ(Q) =

P(Q|Pa

), which deﬁnes conditional probability dis-

tribution given instantiation of parent variables, is at-

tached to each chance variable Q ∈ C. We use Ch

denote Q’s children. A utility factor, ψ(U) = R(Pa

where R returns real-valued rewards, is associated

with each utility node, U ∈ U. The variables involved

in a probability or utility factor become the domain of

this factor, for example, dom(φ(Q)) = {Q} ∪ Pa

A policy for decision node, D

∈ D, is a mapping,

: dom(Pa

) → dom(D

), i.e., δ

(pa

) = d

. A pol-

icy for the decision problem is a sequence of policies

for all the decision nodes. The solution of a DID is a

strategy that maximizes the expected value MEU(D),

computed using the sum-max-sum rule (Koller and

Friedman, 2009):

∑

max

∑

... max

∑

(

∏

∈C

P(Q

|Pa

) ·

∑

C,D

R(C,D))

where I

, I

, ..., I

n−1

is the set of chance variables in-

cident on the decision nodes, D

, D

, ..., D

, thereby

forming the information sets.

The MEU may be computed by repeatedly elimi-

nating variables. Let Φ and Ψ be the set of probability

and utility factors, respectively. Given variable Q, the

probability and utility factors having Q in their do-

main are denoted as Φ

and Ψ

, respectively. After

Q is eliminated, the factor sets are updated as follows:

Φ = (Φ\ Φ

) ∪ {φ

} and Ψ = (Ψ \ Ψ

) ∪ {

Here, φ

∑

∏

and ψ

∑

∏

(

∑

)

when Q is a chance variable; if Q is a deci-

sion variable, then, φ

= max

∏

and ψ

max

∏

(

∑

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

184

2.2 Interactive DID

Interactive DID (I-DID) (Doshi et al., 2009) mod-

els an individual agent’s planning (sequential deci-

sion making) in a multiagent setting. In a I-DID,

other agents’ candidate behaviors are modeled as

they impact the common states and rewards during

the subject agent’s decision-making process. Simul-

taneously, other agents also reason about the sub-

ject agent’s possible actions in their decision making.

This recursive modeling is encoded in an auxiliary

data item called the model node M

j,l−1

which con-

tains models of the other agent, say j of level l−1 and

chance node A

which represents the distribution over

j’s actions. The link between M

j,l−1

and A

, named as

policy link, indicates that the other agent’s predicted

action is based on its models. The models can be

DIDs, I-DIDs or simply probability distributions over

actions. The link between M

j,l−1

and M

t+1

j,l−1

, called

model update link, represents the update of j’s model

over time.

Example 1 (Multiagent tiger problem (Gmy-

trasiewicz and Doshi, 2005)). Consider two agents

standing in front of two closed doors with a tiger or

some gold behind each door. If an agent opens a door

with a tiger behind it, it receives a penalty, otherwise

a reward. Agents can listen for growls to gain in-

formation about the tiger’s location as well as hear

creaks if the other agent opens a door. But, listening

is not accurate. When the agent receives a reward or

penalty, the game is reset. There is another agent j

with the same character sharing the environment with

agent i without noticing the existence of agent i. They

receives reward or penalty together, therefore agent i

needs to take into account agent j’s behavior. A two

time-slice I-DID for agent i situated in the multiagent

tiger problem is depicted in Fig. 1.

t+1

j,l-1

t+1

Figure 1: A two time-slice I-DID for agent i in the tiger

problem. Policy links are marked as dash lines, while model

update links are marked as dotted lines. TL stands for ‘Tiger

Location’ and GC stands for ‘Growl&Creak’.

Solving an I-DID (shown in Fig. 3) requires solv-

ing the lower-level models, and this recursive pro-

cedure ends at level 0 where the I-DID reduces to a

DID (Line 4). The policies from solving lower-level

models are used to expand the next higher-level I-DID

(Line 5 - 10). We may then replace the model nodes,

policy and the model update links with regular chance

nodes and dependency links. States of the nodes and

parameters of the links are speciﬁed according to the

obtained policies (Line 11 - 13). Subsequently, an I-

DID becomes a regular DID, whose MEU is obtained

(Line 15). Doshi and Zeng (2009) provide more de-

tails about I-DIDs including an algorithm for solving

it optimally.

Example 2. The I-DID shown in Fig. 1 is expanded

as shown in Fig. 2. GC denotes a chance variable

for observations of Growl&Creak, and the remaining

chance nodes are grouped and denoted by X

for con-

venience. The MEU is calculated as follows.

MEU[D] =

∑

max

∑

P(X

)P(GC

)

∑

t+1

max

t+1

∑

t+1

P(X

t+1

)P(GC

t+1

)

) + R

t+1

)]

(1)

t+1

Mod[M

]

t+1

Mod[M

t+1

]

t+1

Figure 2: The ﬂat two time-slice DIDs for the tiger problem.

Model nodes are replaced by a set of ordinary chance nodes.

All hidden variables are grouped as X

2.3 GPU and CUDA

Graphics processing units (GPUs) were originally de-

signed for rendering computer graphics.

In a GPU, there are a number of streaming multi-

processsors (SM), each containing a set of stream pro-

cessors, registers and shared local memory (SMEM).

At run time, a set of parallelized computation tasks re-

ferred to as a thread block are executed on a SM and

distributed across the processors. In order to achieve

good performance, it is crucial to map algorithms to

the GPU architecture efﬁciently, which is optimized

for high throughput. For example, designs that favor

coalesced memory access are cost-effective. In the

past decade, general purpose computing on the GPU

has increased with a focus on bridging the gap be-

tween GPUs and CPUs by letting GPUs handle the

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

185

I-DID EXACT

(level l ≥ 1 I-DID or level 0 DID, horizon T)

Expansion Phase

1. For t from 0 to T − 1 do

2. If l ≥ 1 then

Populate M

t+1

j,l−1

3. For each m

in Range(M

j,l−1

) do

4. Recursively call algorithm with the l − 1

I-DID (or DID) that represents m

and

the horizon, T − t

5. Map the decision node of the solved I-DID

(or DID), OPT(m

), to the corresponding

chance node A

6. For each a

in OPT(m

) do

7. For each o

in O

(part of m

) do

8. Update j’s belief,

t+1

← SE(b

)

9. m

t+1

← New I-DID (or DID) with

t+1

as the initial belief

10. Range(M

t+1

j,l−1

)

∪

← {m

t+1

}

11. Add the model node, M

t+1

j,l−1

, and the model update

link between M

j,l−1

and M

t+1

j,l−1

12. Add the chance, decision, and utility nodes for t + 1

time slice and the dependency links between them

13. Establish the CPDs for each node

Solution Phase

14. If l ≥ 1 then

15. Represent the model nodes, policy links and the

model update links as in Fig. 1 to obtain the DID

16. Apply the standard sum-max-sum rule to solve the

expanded DID (other solution approaches may also be used)

Figure 3: Algorithm for exactly solving a level l ≥ 1 I-DID

or level 0 DID expanded over T time steps.

most intensive computing while still leaving control-

ling tasks to CPU. CUDA provided by NVIDIA is

a general-purpose parallel computing programming

model for NVIDIA’s GPUs. CUDA abstracts most

operational details of GPU and alleviates the devel-

oper from the technical burden GPU-oriented pro-

gramming. An important component of a CUDA pro-

gram is a kernel, which is a function that executes in

parallel on a thread block.

3 RELATED WORK

Multiple frameworks formalize planning under un-

certainty in settings shared with other agents who

may have similar or conﬂicting objectives. A rec-

ognized framework in this regard is the interactive

POMDP (Gmytrasiewicz and Doshi, 2005) that fa-

cilitates the study of planning in partially observ-

able multiagent settings where other agents may

be cooperative or non-cooperative. I-DIDs (Doshi

et al., 2009) are a graphical counterpart of interactive

POMDPs and have the advantage of a representation

that explicates the embedded domain structure by de-

composing the state space into variables and relation-

ships between the variables.

I-DIDs contribute to a promising line of research

on graphical models for multiagent decision making

and planning, which includes multiagent inﬂuence di-

agrams (MAID) (Koller and Milch, 2001), networks

of inﬂuence diagrams (NID) (Gal and Pfeffer, 2008),

and limited memory inﬂuence diagram based play-

ers (Søndberg-Jeppesen et al., 2013). I-DIDs differ

from MAIDs and NIDs by offering a subjective per-

spective to the interaction and solutions not limited

to equilibria, by ascribing other agents with a distri-

bution of non-equilibrium behaviors as well. Impor-

tantly, I-DIDs offer solutions over extended time in-

teractions, where agents act and update their beliefs

over others’ models which are themselves dynamic.

Previous uses of CPU-GPU heterogeneous sys-

tems in the context of graphical models focus on

speeding up exact inference in Bayesian networks due

to parallelization (V. Kozlov and Pal Singh, 1994;

Jeon et al., 2010; Xia and Prasanna, 2008). For ex-

ample, Jeon et al. (2010) report speedup factors in the

range from 5 to 12 for both marginal and most proba-

ble inference in junction trees. In comparison, we el-

evate the problem from performing inference in junc-

tion trees to ﬁnding optimal policies in I-DIDs and

DIDs. As solving I-DIDs requires performing infer-

ence on the underlying Bayesian network in each time

slice, our approach also parallelizes exact inference

using junction trees in a manner similar to previous

work (Zheng et al., 2011). Additionally, we provide a

fast method for evaluating the sum-max-sum rule for

DIDs by parallelizing component operations such as

sum-marginalization and others on a GPU.

4 PARALLELIZED I-DID EXACT

FOR CPU-GPU SYSTEMS

Our approach revises the algorithm, I-DID Exact, pre-

sented in Fig. 3 by parallelizing two component steps

for utilization on a CPU-GPU heterogeneous comput-

ing architecture and through leveraging some of the

recent advances in parallelizing inference in Bayesian

networks.

4.1 Parallelizing Sum-Max-Sum Rule

for MEU

A solution of the sum-max-sum rule mentioned in

Section 2 gives the maximum expected utility of the

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

186

ﬂat DID that results from transforming the I-DID. The

temporal structure of the DID provides an ordering of

the chance, decision and utility variables that is uti-

lized by generalized variable elimination for IDs to

compute the MEU. In our two-time slice DID for the

multiagent tiger problem, the elimination ordering is:

t+1

, Y

t+1

, A

t+1

, X

, Y

, A

, where X and Y are the

sets of hidden variables and those in the information

set of a decision variable in each time slice, respec-

tively. The sum-max-sum rule does not specify an

ordering between the variables X and Y.

4.1.1 Memory-efﬁcient Variable Elimination for

DIDs

In order to efﬁciently use the CPU-GPU memory, we

design the variable elimination memory efﬁciently.

Speciﬁcally, instead of keeping the entire DID in

memory while performing variable elimination, we

lazily bring the minimal set of the other variables and

their factors that are needed in order to eliminate the

variable in question. We refer to this set of variables

as a cover set. We ﬁrst revisit the deﬁnition of a

Markov blanket of a variable.

Deﬁnition 1 (Markov blanket, Pearl (1998)). The

Markov blanket of a random variable Q, denoted as

MB(Q), is the minimal set of variables that makes Q

conditionally independent of all other variables given

MB(Q). Formally, Q is conditionally independent of

all other variables in the network given its parents,

children, and children’s parents.

Deﬁnition 2 (Cover set). The cover set of a random

variable, Q, denoted by CS(Q) is deﬁned as:

CS(Q) = {Q} ∪ MB(Q).

Notice that the cover set of Q consists of itself

and its Markov blanket. Furthermore, we make the

following straightforward observation:

Observation 1. CS(Q) is exactly identical to the

union of the domains of the factor of Q and the factors

of the children of Q,

CS(Q) = dom(φ

)

[

Z∈Ch

dom(φ

)

Let X be the set of variables in the elimination or-

der that precedes Q. As the cover sets of variables in

X would be in memory already, we deﬁne an incre-

mental cover set below that is the set of all variables

in the cover set less all those variables contained in the

cover sets of the variables preceding Q in the elimina-

tion ordering.

Deﬁnition 3 (Incrementalcoverset). The incremental

cover set of a random variable, Q, denoted by ICS(Q)

is deﬁned as:

ICS(Q) = {Q} ∪ MB(Q) \

[

X∈X

CS(X),

where X are the variables that preceded Q in the

elimination ordering.

Factors related to variables in ICS(Q) need to be

additionally fetched into memory because the latter

cover sets are already in memory and overlapping

variables need not be fetched. Lemma 1 provides a

simple way to determine the incremental cover set.

Lemma 1. As variable elimination proceeds, let F

be the set of all factors not previously loaded in mem-

ory with Q in each of their domains. Then, the union

of all variables in the domains of F

, denoted as ∆

forms the incremental cover set of Q.

Proof. For the base case, let Q be the ﬁrst variable

to be eliminated. The union of domains of all fac-

tors with Q in their domains is: ∆

= dom(φ

)) ∪

dom(φ

)) ∪ . ..dom(φ

)). We will show that

∀y ∈ X

, y ∈ MB(Q) or y = Q for i ∈ [1, n]. Suppose

that ∃y ∈ X

and y /∈ MB(Q) and y 6= Q. Given the

deﬁnition of the Markov blanket, y is not a child of Q

or parent of a child of Q. Therefore, from Observa-

tion 1, the corresponding factor, φ

, cannot contain Q

in its domain. This is a contradiction and no such y

exists. Therefore, ∀y ∈ X

, y ∈ MB(Q) or y is Q.

Let Q

be the k

variable to be eliminated. As

the inductive hypothesis, ∆

= {Q

} ∪ MB(Q

) \

X∈X

CS(X). For the inductive step, let Q

k+1

be the

next variable to be eliminated. Notice that

∆

= ∆

k+1

∪CS(Q

) ∪ dom(Φ

k+1

) \ dom(Φ

k+1

)

where Φ

k+1

are the factors with Q

in their do-

mains and not Q

k+1

– these would be absent from

∆

k+1

– and Φ

k+1

are the factors with Q

k+1

and

not Q

in their domains.

We may rewrite the above as:

∆

k+1

= ∆

∪ dom(Φ

k+1

) \ dom(Φ

k+1

) \CS(Q

)

= dom(Φ

k+1

) ∪ ∆

\ dom(Φ

k+1

) \CS(Q

)

As ∆

denotes the domains of all factors with Q

and additionally, with Q

k+1

being present or absent,

∆

= dom(Φ

k+1

) ∪ dom(Φ

k+1

) \

X∈X

CS(X).

Using this in the above equation,

∆

k+1

= dom(Φ

k+1

) ∪ dom(Φ

k+1

)

∪dom(Φ

k+1

) \ dom(Φ

k+1

)

X∈X

CS(X)\CS(Q

)

= dom(Φ

k+1

) ∪ dom(Φ

k+1

)

X∈X

CS(X)\CS(Q

)

= dom(Φ

k+1

) \

X∈X∪Q

CS(X)

We may apply a proof similar to that in the base

case to the ﬁrst term above. Therefore,

∆

k+1

= {Q

k+1

} ∪ MB(Q

k+1

) \

X∈X∪Q

CS(X)

= ICS(Q

k+1

)

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

187

Next, we establish the beneﬁts and correctness of

solely considering the cover set of Q in Theorem 1.

We deﬁne the joint probability distribution of the vari-

ables in the cover set ﬁrst.

Deﬁnition 4 (Factored joint probability distribution of

cover set). The factored joint probability distribution

for a cover set of a random variable, Q, is deﬁned as:

P(Q|Pa

)

∏

Z∈Ch

P(Z|Pa

)

Theorem 1. Let Φ

(Ψ

) be a set of relevant proba-

bility (or utility) factors required to compute the new

factor φ

(ψ

) for eliminating variable Q. All the

variables in the domain of Φ

(Ψ

) exactly comprise

the cover set of Q, CS(Q).

Proof. The set of relevant probability factors Φ

can be separated into two categories: P(Q|Pa

) and

P(X|Pa

) where X ∈ Ch

. Consequently, the vari-

able in domains of factors in Φ

are included in

∪Ch

Z∈Ch

∪ {Q}, which is the cover set

of Q by deﬁnition.

Assume there exists a variable Y 6= Q, Y ∈

CS(Q) and Y does not appear in either P(Q|Pa

) or

P(X|Pa

) where X ∈ Ch

. In other words, Y /∈ Pa

Y /∈ Ch

and Y /∈

Z∈Ch

. Consequently, Y /∈

MB(Q). As Y 6= Q, therefore Y /∈ CS(Q), but this

is a contradiction. Therefore, all variables in CS(Q)

appear in the relevant factors. A similar argument is

applicable to the utility factors Ψ

Thus, the cover set of a variable, Q, locally iden-

tiﬁes those variables whose factors change on elimi-

nating Q. These factors contain Q in their domains.

The alternative is a naive global method that searches

over all factors and identiﬁes those with Q in their do-

mains. We illustrate the use of the cover set in elimi-

nating chance and decision variables in the context of

the multiagent tiger problem below.

Example 3 (Variable elimination using cover set).

The two-time slice ﬂat DID is shown in Fig. 4(a). For

clarity, the hidden chance variables in each time slice

are replaced with X

thereby compactingthe DID. The

MEU for the DID is given by Equation 1. The tempo-

ral structure of the DID induces a partial ordering for

the elimination of the variables in the rule above. In

the context of Fig. 4(a), this ordering is: X

t+1

, A

t+1

, X

, A

, GC

We begin by eliminating X

t+1

from the DID. Theo-

rem 1 allows us to focus on the cover set of X

t+1

only,

which is shown in Fig. 4(b).

t+1

(a)

t+1

(d)

t+1

(b)

(c)

t+1

(e)

t+1

(f)

t+1

Figure 4: An illustration of variable elimination for DIDs.

The incremental cover set for each variable is marked using

a dashed line. In (a− f), the DID is progressively reduced

following the elimination order: {X

t+1

,GC

t+1

CS(X

t+1

) ← {X

t+1

}

[

MB(X

t+1

)}

← {X

t+1

,GC

t+1

}

(GC

t+1

) =

∑

t+1

P(CS(X

t+1

)) R

t+1

)

∑

t+1

P(X

t+1

,GC

t+1

) × R

t+1

)

Decision variable, A

, in the probability factor is

converted into a random variable with a uniform dis-

tribution over its states. We update the set of all utility

factors as: Ψ ← {ψ

(GC

t+1

)}

Next, we eliminate A

t+1

from the reduced DID.

Figure 4(c) shows the incremental cover set of A

t+1

with the dashed loop: A

t+1

and its factors addition-

ally need to be fetched into memory.

CS(A

t+1

) ← {A

t+1

,GC

t+1

}

(GC

t+1

) = max

t+1

(GC

t+1

)

The set of utility factors updates to Ψ ←

{ψ

(GC

t+1

)}.

The DID reduces to the one shown in Fig. 4(d),

from which we now eliminate GC

t+1

. The incremental

cover set of this variable is empty as all the variables

in its cover set were utilized previously and preexist in

memory.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

188

CS(GC

t+1

) ← {GC

t+1

}

) =

∑

t+1

P(GC

t+1

) ψ

(GC

t+1

)

The set of utility factors now becomes: Ψ ←

{ψ

)}.

Finally, we eliminate X

and GC

after fetching

(and its factors) into memory.

CS(X

) ← {X

,GC

}

,GC

) =

∑

P(GC

)



) + ψ

)



The utility factor set becomes Ψ ←

{ψ

,GC

)}.

Maximizing over A

and sum marginalization of

will yield an empty factor set and the decision

that maximizes the expected utility of the DID.

4.1.2 Speeding Up Factor Operations using GPU

We perform the product operation between probabil-

ity and utility factors in parallel on a GPU. The oper-

ation is a pointwise product of the entries in factors.

When there are common variables, only entries with

the same value of the common variables is multiplied.

For convenience, we denote R

) + ψ

)

simply as ψ

′

In order to parallelize the factor product, indices

of entries to be multiplied in the factors are needed.

Previous parallelization of inference in Bayesian net-

works sought to minimize the size of the index map-

ping table for GPUs (Jeon et al., 2010) due to the SM

memory limitation. The entire mapping table was de-

composed into smaller ones each giving the mapped

indices of the entries in the second factor for each

non-common variable in the ﬁrst factor. Our utility

factor product follows the similar principle of mes-

sage passing for belief propagation in junction trees.

Pr(X

,GC

)

Tag

...

000

100

010

111

000

...

ψ’

)

Tag

...

000

100

010

111

001

...

SMEM

Spawn |Pr(X

,GC

threads

iter

000

...

Figure 5: The index mapping table. We assume that all

variables are boolean. SMEM denotes shared memory.

Entries in a factor are indexed according to

variables as index =

∑

Q∈dom(ψ)

state

× stride

The stride of a variable X

in a factor,

P(X

,...,X

) is deﬁned as stride

= 1 and

stride

= stride

i−1

· |dom(X

i−1

)|, for i ∈ [1, n].

We also deﬁne an entry’s state vector as

hstate

,...,state

i. Here, n = |dom(X

)||dom(GC

)|,

and state

= ⌊

index

stride

⌋ mod |dom(Q)|. A tag for an

entry is the portion of the state vector pertaining to

common variables.

A thread in a SM is allocated to ﬁnding the entries

of the second factor with which we may multiply a

probability value in the ﬁrst factor as we show in Fig.

5. We allocate as many threads as the number of dis-

tinct entries in the ﬁrst factor until no more threads

are available, in which case multiple entries may be

assigned to the same thread. Indices for the entries

whose tags match the tag of the subject entry in the

ﬁrst factor are obtained and the corresponding prod-

ucts are performed. Because the index is needed re-

peatedly, it is beneﬁcial to investigate efﬁcient ways

of computing it. Notice that the index values can

be computed as: index =

∑

Q∈c. v.

state

× stride

∑

Q∈dom(ψ)/c. v.

state

× stride

, here c.v. stands for

common variables.

As a particular thread must ﬁnd entries with the

same tag, we compute the ﬁrst summation in the

above equation once, cache it and then reuse it in ﬁnd-

ing the indices of the other entries. As illustrated in

Fig. 5, each thread saves on computing the ﬁrst sum-

mation two times because the noncommon variable,

, has three states, thereby saving O(|X

|) each time

which gets substantial in the context of factor prod-

ucts that have a large number of common variables.

Factor products in the sum-max-sum rule are usu-

ally followed by sum-marginalization operations. For

example, the last variable elimination shown in Fig. 4

marginalizes the set of variables in X

that includes

tiger

location

, A

, Mod[M

], among others, from the

factor, P(X

,GC

) × ψ

′

). Let us denote the re-

sulting product factor as, ψ

,GC

). For illus-

tration purposes, let us focus on marginalizing a sin-

gle variable, A

∈ X

from ψ

,GC

000

001

010

011

100

101

110

State vector ψ

,GC

)

111

State vector ψ

(GC

)

Figure 6: Four threads are used to produce the entries in the

four rows of the resulting factor, ψ

, on the right.

We parallelize and speed up sum-marginalization

by allowing a separate thread to sum those entries in

the factor that correspond to the different values of A

while keeping the other variable values ﬁxed (Zheng

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

189

et al., 2011) (Fig. 6).

4.1.3 Parallelizing Message Passing in the BN

Probability factors utilized during variable elimina-

tion for computing the MEU of the ﬂat DID often

involve joint probability distributions. For example,

the factor P(X

,GC

) utilized in the elimination of X

is the joint distribution over the multiple variables in

and GC

. We may efﬁciently compute the proba-

bility factor tables by forming a junction tree of the

Bayesian network in each time slice, and computing

the joints using message passing (Zheng et al., 2011).

Analogously to the operations involved in vari-

able elimination, message passing in a junction tree

involves sum-marginalization and factor products.

However, the typical order of these operations in mes-

sage passing is the reverse of those in the sum-max-

sum rule: we perform marginalizations ﬁrst followed

by factor products. These operations are part of the

marginalization and scattering steps that constitute

message passing.

We parallelizemessage passing in junction trees to

efﬁciently compute the probability factors. Both sum-

marginalizations and factor products are performed

on a CPU-GPU heterogeneous system by utilizing

multiple threads in a SM each of which computes the

relevant index mapping tables online and performs the

products as we described previously in Figs. 5 and 6.

This is similar to the approach of Zheng et al. (2011)

that decomposes the whole index mapping table into

smaller components that are relevant to each thread.

However, the latter precomputes tables while forming

the junction trees and stores them in memory.

5 DESIGN AND ALGORITHMS

The MEU of a ﬂat DID is computed using the

sum-max-sum rule. Factor product and sum-

marginalization operations are parallelized by wrap-

ping them in a GPU kernel function. This launches

one or more blocks of threads for performing the

products and sums of probabilities and utilities.

For the message passing performed on the junc-

tion tree, a CPU routine call selects the relevant

cliques, which are nodes in the junction tree, for

processing. It computes the required parameters for

cliques involved in the current communication, and

asynchronously transmits the result to the GPU. After

all parameters are computed, a GPU block of threads

is launched to compute and propagate the message to

a recipient clique.

Before running the algorithm, CUDA requires the

CPUm

Sum-

max-

sum

for

MEU

Factor product

Sum marginal

Inference

junction

tree

Msg

pass

ing

Asynchronous

Data Transfer

Thread Block 1

GPU

Thread Block 2

Thread Block N

Figure 7: An abstract view of the parallelization of MEU

computation for solving an I-DID on a CPU-GPU system.

kernel to be appropriately conﬁgured: in terms of grid

size and shape, shared memory and registers utiliza-

tion. We note three choices: 1) ﬁxing thread block

size in order to utilize more registers; 2) minimizing

the number of registers to possibly achieve high occu-

pancy; and 3) ﬁnding shared memory size per block

to minimize global memory accesses. Quick experi-

mentation revealed that for both the factor operations

and the message passing algorithm, ﬁxing the num-

ber of registers to 32 and using shared memory chunk

size of 512 were suitable. For effective allocation of

memory, we allocate large chunks of memory at pro-

gram start, and all GPU memory allocation requests

use one of these chunks of memory. If more memory

is requested than chunks available, a chunk is reallo-

cated to possibly accommodate the request.

Algorithms 1 and 2 provide the steps for perform-

ing the factor product and the sum-marginalization,

respectively, on the GPU. In algorithms 1, the utility

factor is divided and loaded into shade memory. The

input and output indices in both algorithms are com-

puted following the discussion in Section 4.1.2. We

show the abstract design of the algorithm in Fig. 7.

Algorithm 1: Factor Product on GPU.

Require: probability factor φ and utility factor ψ

Ensure: product factor ψ

′

1: tid is the thread id

2: numIter is the number of iterations

3: workSize is the number of products per thread

4: for i ← 1 to numIter parallel do

5: begLoadIdx ← begin offset

6: endLoadIdx ← end offset

7: SMEM ← ψ[begLoadIdx...endLoadIdx]

8: for j ← 1 to workSize parallel do

9: iidx is the input index

10: oidx is the output index

11: ψ

′

[oidx] ← φ[tid] ∗ SMEM[iidx]

12: end for

13: end for

6 ANALYSIS OF SPEED UP

We theoretically analyze the speed up resulting from

parallelizing the factor product, sum-marginalization

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

190

Algorithm 2: Sum-marginalization on GPU.

Require: ψ which needs to be marginalized

Ensure: the resulting factor ψ

′

1: tid is the thread id

2: workSize is the number of additions per thread

3: sum ← 0

4: for j ← 1 to workSize parallel do

5: iidx ← index to ψ

6: sum ← sum+ ψ[iidx]

7: end for

8: oidx ← output index to ψ

′

9: ψ

′

[oidx] ← sum

and factor sum operations that are involved in com-

puting the MEU. Let φ

and ψ

be some probabil-

ity and utility factors involving chance variable, Q,

respectively, and S

denote the set of variables

in common between the domains of the two factors.

Then, dom(ψ

) − S

is the set of variables in ψ

that are not in φ. In multiplying the two factors, the

number of independent products are:

F P



|φ

||ψ

|/|S

| if |S

| > 0;

|φ

||ψ

| otherwise.

Our approach parallelizes the above factor prod-

uct using |φ

| threads, with each thread performing

|ψ

products if |S

| > 0 otherwise |ψ

|. Anal-

ogously, the number of independent sums are:

F S

′

(

|ψ

′

||ψ

|/|S

′

| if |S

′

| > 0;

|ψ

′

||ψ

| otherwise.

For marginalization of a utility factor ψ

over a

random variable Q in its domain, the number of in-

dependent maximizations are |ψ

|/|dom(Q)|, where

dom(Q) gives the number of states of the variable, Q.

We assign a thread to each independentmaximization.

Let C, D and U denote the sets of decision, chance

and utility variables respectivelyin the DID. We begin

by establishing the time complexity of evaluating the

sum-max-sum rule serially on a ﬂat DID. Overall, this

requires summing utility factors, whose complexity

∑

Q∈U

F S

′

= O(|U||ψ

′

||ψ

|/|S

′

|); per-

forming as many factor products as there are chance

variables, whose time complexityis

∑

Q∈C

F P

O(|C||φ

||ψ

|/|S

|); sum-marginalization of the

chance variables in probability factors with complex-

ity, O(|C||φ

|); and the maximization over the deci-

sion variables, whose complexity is O(|D||ψ

∗

|). The

total complexity for the serial computation is

O((|U|

|ψ

′

||ψ

′

) + |C||φ

|ψ

+ 1) + |D||ψ

|).

Here, ψ

′

denotes an expected utility;

are the smallest sets of shared variables be-

tween probability and utility factors respectively;

is the decision variable with the largest utility factor

to maximize over.

Each parallelized utility sum operation has a

theoretical time of F S

′

|ψ

|; the parallelized

factor product requires a time of F S

|φ

the parallelized sum-marginalization requires a

time of |φ

|/|dom(Q)|; and the parallelized max-

marginalization requires a time of |ψ

|/|dom(D)|

units. Consequently, the total complexity for the par-

allel computation is:

O(κ+ (|U|

|ψ

′

) + |C|(

|ψ

+ 1) +

|D||ψ

|dom(D)|

)

where D

is the decision variable with the smallest

domain size and κ, which is a function of the size of

the network, is the total cost for kernel invocations

and memory latency in the GPU.

Theorem 2 (Speed up). The speed up of evaluating

the sum-max-sum rule for a ﬂat DID with set, C, of

chance variables, D of decision variables, and U of

utility variables is upper bounded by:

(|U|

|ψ

′

||ψ

′

) + |C||φ

|ψ

+ 1) + |D||ψ

κ+ (|U|

|ψ

′

) + |C|(

|ψ

+ 1) +

|D||ψ

|dom(D)|

where ψ

′

denotes an expected utility;

are the smallest sets of shared variables between

probability and utility factors respectively;

D is the

decision variable with the largest utility factor to

maximize over; D

is the decision variable with the

smallest domain size and κ, which is a function of the

size of the network, is the total cost for kernel invoca-

tions and memory latency in the GPU.

7 EXPERIMENTS

In this section we empirically evaluate the perfor-

mance and scalability of Parallelized I-DID Exact on

different networks against its serial implementation I-

DID Exact. Experiments were performed on a desk-

top with Intel CPU (3.10GHz), 16GB RAM and a

NVIDIA Geforce GTX480 graphics card with 480

cores, 1.5GB global memory and 64KB of shared

memory for each SM.

Besides the tiger problem (|S|=2, |A

|=|A

|=3,

|Ω

|=6 and |Ω

|=2), we also evaluated the proposed

approach on a larger problem domain: the two-agent

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

191

unmanned aerial vehicle (UAV) interception prob-

lem (|S

|=25, |S

|=9, |A

|=|A

|=5, |Ω

|=|Ω

|=5). In

this problem, there is a UAV and a fugitive with noisy

sensors and unreliable actuators locating in a 3 × 3

grid. The fugitive j plans to reach the safe house

while avoiding detection by the hostile UAV i (Zeng

and Doshi, 2012).

7.1 PERFORMANCE EVALUATION

For the Tiger problem, different numbers of (10, 50,

and 100) level 0 DIDs with the number of planning

horizons from 6 to 9 are solved and used to expand

the level 1 I-DIDs of 3 to 5 horizons. The aver-

age factor sizes increases along with the number of

horizons. The mean speed up ranges between 6 and

slightly greater than 10, with I-DIDs of longer hori-

zon demonstrating greater speed up in their solution.

Due to the complexity of the UAV domain and limited

global memory, the current implementation solves the

problem optimally up to horizon 3. However, paral-

lelized I-DID Exact still provides promising speedups.

Problems with larger factors, which can contain more

common variables, show greater speedups.

All experiment results are summarized in Tables

1 and 2. The I-DIDs for the different problem do-

mains unrolled to different look ahead (T

) with dif-

ferent number of level 0 models (the column |M

|) at

different look aheads (T

) were used to evaluate the

performance of the proposed algorithm. The aver-

age sizes of factors processed during variable elimina-

tion, including probability and utility factors, of level

0 and level 1 models are listed in columns titled by

Mean(|φ|) and Mean(|ψ|). Columns labeled by CPU

and GPU contain the total running times, which in-

cludes the time for solving level 0 models, expansion,

and solving the resulting level 1 model. The speedup

is indicated in the last column titled with Speedup.

As suggested in Theorem 2, the theoretical

speedup, as lower bounds, for these two domains are

70/(κ + 22) and 450/(κ + 28), respectively, where κ

is the total cost for kernel invocations and memory

latency in the GPU. As the tiger problem is a small

domain, the cost of data transmission is negligible,

and the lower bound can be seen as approximately 4.

However for the larger UAV problem, a comparison

with the reported empirical speedup shows that κ is

not negligible.

Fig. 7 shows the speedup for the Tiger and the

UAV domain with different problem sizes. Overall,

the speed up in planning optimally increases as the

sizes of the level 1 and level 0 planning problems

increase. Varying the number of candidate models

(DIDs) ascribed to the other agent did not signiﬁ-













           



 !"#$"%%

&'&(

&'&(

&'&(











  



 !" #$#%

&'&($%

&'&()

&'&(%

Figure 8: The speedup for the multiagent tiger problem and

the UAV problem given different amount of level 0 models

and number of decision horizons.

cantly impact the speed ups. This is expected as

the lower-level models are solved sequentially. Par-

allelization of their solutions seems to be an obvious

avenue of future work, but is deceptively challenging.

7.2 Optimizing Thread Block Size

By parallelizing the computation on the GPU, we ob-

served around an order of magnitude speedup through

the performed experiments. As computation tasks

are organized as a set of thread blocks and exe-

cuted on SMs, the number of thread blocks deter-

mines the overall performance. Generally speaking,

more thread blocks will increase the degree of par-

allelization with higher synchronization cost. Auto-

matically calculating the optimal thread-block sizes

(Sano et al., 2014), which is domain dependent, is

beneﬁcial but computationally expensive. The ex-

pense may be amortized over multiple runs. But, be-

cause we solve I-DIDs just once for a domain, this

expense cannot be amortized and signiﬁcantly adds

to the run time. As a trade-off, we empirically search

for a block size that optimizes the solution for many

problem domains following the CUDA optimization

heuristics.

We evaluated the performance of Parallelized I-

DID Exact as the number of threads in each block

is increased from 64 to 640, on a level 1 I-DID of

horizon 3 and 10 lower-level DIDs as candidate mod-

els. The impact of different blocks sizes on run time

is shown in Fig. 9. As observed, the block size of

512 gives the best performance in terms of running

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

192

Table 1: Run times, factor sizes and speed ups for the multiagent tiger problem. |M

| denotes the number of level 0 models.

Mean |φ| and Mean |ψ| are the average sizes of probability and utility factors in the models, respectively. Columns titled by

CPU and GPU denote the running times for different implementations. The speedups are listed in the last column.

Level 1 Level 0 Time (seconds)

Mean |φ| Mean |ψ| T

Mean |φ| Mean |ψ| CPU GPU Speedup

3 1959 2237

6 2192 1703 3.14 0.51 6.2

7 11126 8620 17.8 1.93 9.2

8 58835 45556 106 10.2 10.4

9 306284 237130 644 60.0 10.8

4 38376 44998

6 2192 1703 5.59 0.77 7.3

7 11126 8620 20.3 2.18 9.3

8 58835 45556 108 10.5 10.3

9 306284 237130 647 60.0 10.8

5 600493 655141

6 2192 1703 50.0 5.09 9.8

7 11126 8620 64.7 6.48 10.0

8 58835 45556 153 14.7 10.4

9 306284 237130 691 64.3 10.7

3 5449 5307

6 2192 1703 13.7 1.83 7.5

7 11126 8620 80.5 8.21 9.8

8 58835 45556 481 46.0 10.5

9 306284 237130 2930 272 10.8

4 63225 65249

6 2192 1703 16.1 2.08 7.7

7 11126 8620 83.0 8.45 9.8

8 58835 45556 484 45.9 10.5

9 306284 237130 2931 272 10.7

672794 683530 6 2192 1703 60.5 6.44 9.4

879830 910980

7 11126 8620 127 12.7 10.0

8 58835 45556 528 50.6 10.4

9 306284 237130 2972 277 10.7

100

3 5546 5322

6 2192 1703 27.4 3.56 7.7

7 11126 8620 162.5 16.3 9.9

8 58835 45556 971 92.8 10.4

9 306284 237130 5937 573 10.3

4 63294 65260

6 2192 1703 29.9 3.81 7.8

7 11126 8620 164.9 16.7 9.8

8 58835 45556 974 92.4 10.5

9 306284 237130 5937 569.6 10.7

672848 683538 6 2192 1703 74.3 8.11 9.2

879884 910989

7 11126 8620 209 21.0 9.9

8 58835 45556 1018 96.8 10.5

9 306284 237130 5975 575 10.4

Table 2: Run times, factor sizes and speed ups for the multiagent UAV problem. Columns have similar meanings.

Level 1 Level 0 Time (seconds)

Mean |φ| Mean |ψ| T

Mean |φ| Mean |ψ| CPU GPU Speedup

10 3 104223 75120

3 1235 1029 16.56 2.22 7.5

4 20237 9467 24.38 3.17 7.7

5 392043 170405 239 27.6 8.7

25 3 106410 75270

3 1235 1029 16.9 2.27 7.4

4 20237 9467 32.6 4.23 7.7

5 392043 170405 462.4 55.6 8.8

50 3

209573 117520 3 1235 1029 17.51 2.41 7.3

212260 117695 4 20237 9467 46.61 6.02 7.7

153348 81195 5 392043 170405 845.1 99.3 8.5

time. The upside is that as more threads are involved

in the computation there are less iterations of fetching

global memory loads to shared memory. In contrast,

the degradation in performance is expected because

spawning more threads per block limits the number

of blocks that can be scheduled to run concurrently

because of limited resources, hence, the observed fall

in performance.

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

193



















           





Figure 9: The running time of the multiagent tiger problem

given different GPU’s thread block sizes.

8 CONCLUSION

We presented a method for optimal planning in multi-

agent settings under uncertainty that utilizes the paral-

lelism provided by a heterogeneous CPU-GPU com-

puting architecture. We focused on the interactive

dynamic inﬂuence diagrams, which are probabilis-

tic graphical models whose solution involves trans-

forming the I-DID into a ﬂat DID and computing the

policy with the maximum expected utility. Opera-

tions involving probability and utility factors during

variable elimination are parallelized on GPUs. We

demonstrate speed ups close to an order of magnitude

on multiple problem domains and run times that are

less than 17 minutes for large numbers of models and

long horizons. To the best of our knowledge, these

are the fastest run times reported so far for exactly

solving I-DIDs and other related frameworks such

as I-POMDPs for multiagent planning, and represent

a signiﬁcant step forward in making these complex

frameworks practical.

As aforementioned, lower level models can be

DIDs or I-DIDs with different initial beliefs. These

candidate models are differinghypotheses of the other

agent’s behavior, and therefore may be solved inde-

pendently in parallel. However, as solving I-DIDs

requires large amount of memory, we may not solve

these in parallel on a single GPU. Nevertheless, mod-

ern computing platforms may contain two or more

GPU units linked together and programmable using

CUDA.

Furthermore, multiple networked machines

with GPUs may be utilized using CUDA-MPI. How-

ever, as the factor operation is not computational in-

tensive, whether the saving from the parallel compu-

tation on the GPU side can compensate the cost of

transporting data between CPU and GPU is still an

NVIDIA promotes having multiple GPU units man-

aged by its scalable link interface.

open question. Comparisons based on different types

of GPUs will be our immediate future work as well.

ACKNOWLEDGEMENTS

This research is supported in part by an ONR Grant,

#N000141310870, and in part by an NSF CAREER

Grant, #IIS-0845036. We thank Alex Koslov for mak-

ing his implementation of a parallel Bayesian network

inference algorithm available to us for reference.

REFERENCES

Bernstein, D. S., Givan, R., Immerman, N., and Zilberstein,

S. (2002). The complexity of decentralized control of

Markov decision processes. Mathematics of Opera-

tions Research, 27(4):819–840.

Berstein, D. S., Hansen, E. A., and Zilberstein, S. (2005).

Bounded policy iteration for decentralized POMDPs.

In IJCAI, pages 1287–1292.

Chandrasekaran, M., Doshi, P., Zeng, Y., and Chen, Y.

(2014). Team behavior in interactive dynamic inﬂu-

ence diagrams with applications to ad hoc teams. In

AAMAS, pages 1559–1560.

Chen, Y., Hong, J., Liu, W., Godo, L., Sierra, C., and

Loughlin, M. (2013). Incorporating PGMs into a BDI

architecture. In PRIMA, pages 54–69.

Doshi, P., Zeng, Y., and Chen, Q. (2009). Graphical models

for interactive POMDPs: Representations and solu-

tions. JAAMAS, 18(3):376–416.

Gal, K. and Pfeffer, A. (2008). Networks of inﬂuence di-

agrams: A formalism for representing agents’ beliefs

and decision-making processes. JAIR, 33:109–147.

Gmytrasiewicz, P. J. and Doshi, P. (2005). A framework

for sequential planning in multiagent settings. JAIR,

24:49–79.

Howard, R. A. and Matheson, J. E. (1984). Inﬂuence dia-

grams. In Howard, R. A. and Matheson, J. E., editors,

The Principles and Applications of Decision Analysis.

Strategic Decisions Group, Menlo Park, CA 94025.

Jeon, H., Xia, Y., and Prasanna, K. V. (2010). Parallel ex-

act inference on a cpu-gpgpu heterogenous system. In

ICPP, pages 61–70.

Koller, D. and Friedman, N. (2009). Probabilistic Graphi-

cal Models: Principles and Techniques. MIT Press.

Koller, D. and Milch, B. (2001). Multi-agent inﬂuence dia-

grams for representing and solving games. In IJCAI,

pages 1027–1034.

Luo, J., Yin, H., Li, B., and Wu, C. (2011). Path planning

for automated guided vehicles system via I-DIDs with

communication. In ICCA, pages 755 –759.

Pearl, J. (1998). Probabilistic Reasoning in Intelligent Sys-

tems: Networks of Plausible Inference. Morgan Kauf-

mann, Berlin, Germany.

Sano, Y., Kadono, Y., and Fukuta, N. (2014). A perfor-

mance optimization support framework for gpu-based

trafﬁc simulations with negotiating agents. In ACAN.

ICAART2015-InternationalConferenceonAgentsandArtificialIntelligence

194

Smallwood, R. and Sondik, E. (1973). The optimal con-

trol of partially observable Markov decision processes

over a ﬁnite horizon. Operations Research, 21:1071–

1088.

Søndberg-Jeppesen, N., Jensen, F. V., and Zeng, Y. (2013).

Opponent modeling in a PGM framework. In AAMAS,

pages 1149–1150.

V. Kozlov, A. and Pal Singh, J. (1994). A parallel Lauritzen-

Spiegelhalter algorithm for probabilistic inference. In

Supercomputing, pages 320–329.

Xia, Y. and Prasanna, K. V. (2008). Parallel exact inference

on the cell broadband engine processor. In SC, pages

1–12.

Zeng, Y. and Doshi, P. (2012). Exploiting model equiva-

lences for solving interactive dynamic inﬂuence dia-

grams. JAIR, 43:211–255.

Zheng, L., Mengshoel, O. J., and Chong, J. (2011). Be-

lief propagation by message passing in junction trees:

Computing each message faster using gpu paralleliza-

tion. In UAI.

FastSolvingofInfluenceDiagramsforMultiagentPlanningonGPU-enabledArchitectures

195