Table 1: Empirical measures and related information on the underlying planning problem.
Empirical measure Information
OCS Dependency of one agent on the other agents’ policies
Local plans studied Dependency of agent on the other agents’ policies.
Complexity of the local decision problem.
Joint policies studied Computational difficulty of finding of the joint controller.
Dependencies among all agents.
Branches in the optimal joint policy Observations of their local resource states by the agents.
Size of the optimal joint policy Repartition of goals (rocks) among agents.
Discretization (number of pieces) Complexity of the decision problem (local or global).
certain number of observations before deciding which
goals to achieve. In the multiagent framework, the
collaboration among agents and its possible penal-
ties affects the repartition of goals, and thus the need
for observation of its resource state by each agent.
As a consequence, this also affects the computational
weight of finding an optimal policy for a team of
agents. The rest of this paper reports on the results
of a series of simulations and tests that yield empir-
ical evidences of the relation between collaboration,
observation and computation.
3 COMPUTATION AND
COMPLEXITY
3.1 Solving RC-TI-DEC-HMDPs
Here we give a little background on the solving of an
m-agents RC-TI-DEC-HMDPs. The Cover Set Algo-
rithm (CSA) is an efficient algorithm that finds opti-
mal policies (Becker et al., 2004; Petrik and Zilber-
stein, 2007). It is a two steps algorithm. The first
step consists in finding a set of policies for each of
(m − 1) agents, called the optimal cover set (OCS).
Each agent’s OCS is such that for any of the other
agent’s policies it contains at least a policy that is op-
timal. In other words, the OCS of an agent is guaran-
teed to contain the optimal policy for this agent that
belongs to the optimal policy for the team. In com-
puting the OCS for an agent, the CSA has to study
a number of competing local policies for this agent.
This number yields an information on the dependency
of the agent w.r.t. the other agent policies. The second
step iterates all combinations of policies in the (m−1)
OCS, computes an optimal policy for the m-th agent,
and returns the combination of m policies that yields
the maximal expected global value (2). Table 1 sums
up the empirical measures and their information on
the underlying planning problem.
Computationally, the challenging aspect of solv-
ing an HMDP is the handling of continuous variables,
and particularly the computation of the so-called Bell-
man optimality equation. At least two approaches,
(Feng et al., 2004) and (Li and Littman, 2005) ex-
ploit the structure in the continuous value functions
of HMDPs. Typically these functions appear as a col-
lection of humps and plateaus, where the later corre-
spond to a region in the continuous state space where
similar goals are pursued by the policy. The steepness
of the slope between plateaus reflects the uncertainty
in achieving the underlying goals. The algorithms
used for producing the results analyzed in this paper
exploit a problem structure where the continuous state
can be partitioned into a finite set of regions. Tak-
ing advantage of the structure relies on grouping those
states that belong to the same plateau, while dynam-
ically scaling the discretization for the regions of the
state space where it is most useful such as in between
plateaus. It follows that the dynamic discretization of
the continuous state-space reflects the complexity of
the decision problem: the less discretized pieces, the
easiest the decision, see Table 1.
3.2 Empirical Evidences
This section reports on planning for our case-study.
It helps understanding the relation between collabo-
ration and computation. Figure 3 reports on the com-
putation of the optimal joint policy. Figure 3(a) shows
the number of joint policies studied for selecting the
optimal joint policy. This number jumps with the re-
duction of the collaboration factor among agents that
is implicitely carried by the joint reward structure.
One hypothesis is that the problem becomes globally
more computational when the amount of collabora-
tion among agents is reduced. In fact, this hypothesis
is confirmed by the results on figure 3(b). The number
of discretized regions in the optimal four-dimensional
global value function reflects the discretization of the
optimal value functions of individuals. The finer the
discretization, the more complex and thus the more
COLLABORATION VS. OBSERVATION: AN EMPIRICAL STUDY IN MULTIAGENT PLANNING UNDER
RESOURCE CONSTRAINT
297