agent to take the right decision. However, in this case,
the error is also low. When the agent chooses a mod-
ule that is near of the optimal one, it does not make
the right decision, but this decision is good.
When the agent calculates π
∼
0
on a PRU of the sec-
ond type, there are few errors. For a given state, Q-
values are separate because the module qualities are
separate. Then, most of the time π
∼
0
is equal to π
∗
0
,
the robot chooses the right decision.
To conclude, if modules are clearly separate, the
robot takes the right decision. If modules are near,
the robot takes good decisions, with a low Q-value
error level.
6.1 Limits of this Approximation
Method
The resource consumption probability distribution
follows a normal distribution law in all the modules
we use for our experiments. Most of the time, it rep-
resents modules that can be found a real application.
We have also tried to make some experiments with
modules in which the resource consumption proba-
bility distribution is not normal. We made a risky
module m : the resource consumption is 4 or 16, with
P r(4|m) = 0.5 and P r(16|m) = 0.5. In this case the
module can only consume 4 or 16 units, but not 8.
In this kind of particular case V
∗
is not smooth. As
a result, V
∼
is not a good approximation. Then, π
∼
0
and π
∗
0
are different, there is a lot of error. But this
case does not represent a realistic resource consump-
tion module.
7 CONCLUSIONS
Resource consumption is crucial for an autonomous
rover. Here, this rover has to cope with limited re-
sources to executed a mission composed of hierarchi-
cal tasks. These tasks are progressiveprocessing units
(PRU). It is possible to compute an optimal resource
control for the entire mission by modelling it with an
MDP. In the case where the mission changes at exe-
cution time, the rover has to recompute online a new
global policy. We propose a way to quickly compute
an approximate value function that can be used to cal-
culate a local policy on the current PRU. MDP De-
composition and value function approximation tech-
niques are used to calculate V
∼
. We have shown in
the last section that the agent takes good decisions
when it uses V
∼
to compute its local policy π
∼
0
. In
a near future, we intend to complete our demonstra-
tion on real robots by considering dynamic situations
where missions can change online.
REFERENCES
Arnt, A., Zilberstein, S., Allan, J., and Mouaddib, A.
(2004). Dynamic composition of information retrieval
techniques. Journal of Intelligent Information Sys-
tems, 23(1):67–97.
Dean, T. and Lin, S. H. (1995). Decomposition techniques
for planning in stochastic domains. In proceedings
of the 14th International Joint Conference on Artifi-
cial Intelligence (IJCAI), pages 1121–1127, Montreal,
Quebec, Canada.
Feng, Z., Dearden, R., Meuleau, N., and Washington, R.
(2004). Dynamic programming for structured contin-
uous markov decision problems. In proceedings of
UAI 2004, pages 154–161.
Meuleau, N., Hauskrecht, M., Kim, K., Peshkin, L., Keal-
bling, L., Dean, T., and Boutilier, C. (1998). Solving
very large weakly coupled markov decision processes.
In proceedings of the 14th Conference on Uncertainty
in Artificial Intelligence, Madison, WI.
Mouaddib, A. I. and Zilberstein, S. (1998). Optimal
scheduling for dynamic progressive processing. In
proceedings of ECAI, pages 499–503.
Parr, R. (1998). Flexible decomposition algorithms for
weakly coupled markov decision process. In proceed-
ings of the 14th Conference on Uncertainty in Artifi-
cial Intelligence, Madison, WI.
Pineau, J., Gordon, G. J., and Thrun, S. (2003). Point-based
value iteration: An anytime algorithm for pomdps.
In proceedings of the 18th International Joint Con-
ference on Artificial Intelligence IJCAI, pages 1025–
1032.
Zilberstein, S., Washington, R., Berstein, D., and Mouad-
dib, A. (2002). Decision-theoretic control of planetary
rovers. LNAI, 2466(1):270–289.
ICINCO 2008 - International Conference on Informatics in Control, Automation and Robotics
206