into n clusters. First of all, we build an “empty” clus-
ter, associated to situations where agent i has no in-
teraction with any other agent. Such a cluster will be
RC = (
/
0,
/
0,
/
0), which means we have no joint relative
state, no joint transition and no joint reward. In such
a cluster, the agent can follow an individual and inde-
pendent policy with no need of coordination. The next
step consists in identifying sub-problems in order to
classify remaining joint relative states. Good clusters
will be such that there are many transitions between
the joint relative states of a given cluster, but only a
few transitions between two different clusters (weakly
coupled clusters and strongly coupled relative states
in the same cluster). In ISR, we could build a cluster
associated to the corridor crossing sub-problem, and
solve it as an independent problem without introduc-
ing too much approximations.
4.6 Sub-optimality of the Approach
Using this model, each agent will solve its own indi-
vidual problem, taking into account the existence of
the neighborhood. Such an approach is sub-optimal,
compared to standard approaches which compute op-
timal joint policies, but gave good results as shown in
our experiments (section 6.2). However, is it really
a good idea, to always seek the optimal joint policy?
It is proved that DEC-POMDPs are not approximable
(finding epsilon-approximations of the joint policy is
NEXP-hard (Rabinovich et al., 2003)). Then, could
not we just compute “good enough” policies, in order
to scale up to real world problems?
Our work is based on this idea, to quickly compute
a good policy, and avoid the huge amount of computa-
tion steps necessary to find the optimal one, while our
is good enough. Then, a difficult problem is to de-
termine if a policy is good enough. Our model gives
everything the agent needs to take its decisions: we
described not only how the agent evolves in its envi-
ronment and receives rewards, but also how the other
agents (the interacting ones) impact these rewards.
The next section introduces a set of algorithms, able
to compute a policy using this model.
5 ALGORITHMS
We developed two algorithms able to solve a problem
described with a DyLIM. First, we describe how to
find an upper bound for the combinatorial complex-
ity. Second, we give some details about how we build
the interaction problem. Third, we introduce our algo-
rithms solving the problems expressed with DyLIM.
5.1 Approximate Joint Relative States
The number of possible joint relative states grows ex-
ponentially with the number of involved agents: with
M the number of possible relations and I the number
of agents, we have (in the worst case) M
I
different
joint relative states (see §6.1 for a more detailed com-
plexity analysis). In order to bound this combinato-
rial explosion, we apply the same behavior as a hu-
man evolving in a crowd. In such a situation, the hu-
man only considers a subset of the people surround-
ing him. For example, he tries to avoid colliding with
people in front of him only.
We apply this idea in our algorithms with two as-
sumptions. First, we consider that one relation can
involve several agents. For example, if the agent
has three agents in front of it, and two on its left,
we consider that the induced joint relative state is
( f ront, le f t) and not ( f ront, f ront, f ront, . . . ). Sec-
ond, we have a preference order between rela-
tions and we consider a maximum of N relations
at the same time. For example, with N = 2 and
the order f ront > le f t > behind, the joint rela-
tive state ( f ront, behind, le f t) would be reduced to
( f ront, le f t). Because of those assumptions, we are
able to bound the combinatorial explosion at N (see
§6.1). If N is large enough, we compute good poli-
cies (for example, in a navigation problem, N = 4 is
enough to consider any immediate danger).
5.2 Building the Interaction Problem
The individual part of the problem is fully de-
scribed with the tuple hS, A, T, R, Ω, Oi, while the tu-
ple hSR, ΩR, OR,Ci used to describe the interaction
part needs a preliminary preprocessing before being
used. We already have the set of joint relative states
(S
1
, . . . , S
n
from each relation cluster RC
n
∈C) and the
observation function (ΩR and OR). In order to com-
pletely define the model, we define the joint transition
and reward functions, using C: we formalize each re-
lation cluster as a nearly independent MMDP.
5.2.1 Computing Transitions and Rewards
We consider an MMDP for each relation cluster
RC
n
= (S
n
, T
n
, R
n
) with S
MMDP
= S
n
and A
MMDP
=
A. For a given MMDP, we compute T(rs, a, rs
0
)
for each tuple (rs, a, rs
0
) using algorithm 1, the idea
for computing R(rs, a, rs
0
) being the same. In this
algorithm, we compute a transition between two
joint relative states, knowing that a joint relative
state rs = ( f ront, le f t) could be associated to the
joint state s
1
= ([x3, y0], [x3, y1], [x2, y0]), or s
2
=
([x2, y5], [x2, y6], [x1, y5]) etc. We call those joint
COLLECTIVE DECISION UNDER PARTIAL OBSERVABILITY - A Dynamic Local Interaction Model
151