3.3 Convergence of the Algorithm
We demonstrate the convergence of our value itera-
tion algorithm by showing that the dynamic program-
ming operator H (Equation 2) has a unique fixpoint
which is reached by its iterative application. We ob-
tain this by showing that H is a contraction mapping
under the following max-norm and applying the Ba-
nach’s fixed point theorem (Ciesielski et al., 2007).
kv − vk = max
s
0
p
∈V
N
max
b∈∆(V )
|v
s
0
p
(b) − v
s
0
p
(b)| (15)
Lemma 6. The operator H is a contraction with con-
tractivity factor γ < 1 under max-norm.
Theorem 4. There is a unique set of value functions
v
∗
satisfying v
∗
= Hv
∗
and the recursive application
of H converges to v
∗
. Series
{
v
t
}
∞
i=0
thus converges to
value functions of an infinite horizon game.
Proof. The operator H is a contraction mapping de-
fined on a metric space of sets of bounded functions
defined on the belief space. By applying Banach’s
fixed point theorem (Ciesielski et al., 2007) we get
that H has a unique fixed point v
∗
and the recursive
application of H converges to v
∗
.
Proposition 1. After t iterations of the value iteration
algorithm it holds that kv
t
− v
∗
k ≤ γ
t
.
4 CONCLUSIONS
We present the first algorithm for solving the class
of two-player discounted pursuit-evasion games with
infinite horizon and partial observability, where the
evader is assumed to be perfectly informed about the
current state of the game (i.e. position of pursuer’s
units). This class of games has a significant relevance
in security domains where a robust strategy that pro-
vides guarantees in the worst case is often desirable.
Our algorithm is a modification of the well-known
value iteration algorithm for solving Partially Ob-
servable Markov Decision Processes (POMDPs), or
stochastic games with concurrent moves. We show
that the strategies can be compactly represented us-
ing value functions that depend on the location of the
pursuing units and the belief about the position of the
evader, but not explicitly on the history of moves.
These value functions are piecewise linear and con-
vex and allow us to design a dynamic programming
operator for the value iteration algorithm.
Our work is the first step towards many practical
algorithms for solving discounted stochastic games
with one-sided partial observability. These can be
applied in many scenarios requiring robust strategies
and thus our work opens the whole new area of re-
search in algorithmic and computational game theory.
One natural continuation is an adaptation of point-
based approximation algorithms for POMDPs to im-
prove the scalability of the value iteration algorithm.
ACKNOWLEDGEMENTS
This research was supported by the Czech Science
Foundation (grant no. 15-23235S) and by the Grant
Agency of the Czech Technical University in Prague,
grant No. SGS16/235/OHK3/3T/13.
REFERENCES
Chung, T. H., Hollinger, G. A., and Isler, V. (2011). Search
and pursuit-evasion in mobile robotics. Autonomous
robots, 31(4):299–316.
Ciesielski, K. et al. (2007). On Stefan Banach and some of
his results. Banach Journal of Mathematical Analysis,
1(1):1–10.
Hansen, E. A., Bernstein, D. S., and Zilberstein, S.
(2004). Dynamic programming for partially observ-
able stochastic games. In AAAI, volume 4, pages 709–
715.
Koller, D., Megiddo, N., and Von Stengel, B. (1996). Ef-
ficient computation of equilibria for extensive two-
person games. Games and Economic Behavior,
14(2):247–259.
McEneaney, W. M. (2004). Some classes of imperfect infor-
mation finite state-space stochastic games with finite-
dimensional solutions. Applied Mathematics and Op-
timization, 50(2):87–118.
Monahan, G. E. (1982). State of the arta survey of partially
observable Markov decision processes: theory, mod-
els, and algorithms. Management Science, 28(1):1–
16.
Pineau, J., Gordon, G., Thrun, S., et al. (2003). Point-based
value iteration: An anytime algorithm for POMDPs.
In IJCAI, volume 3, pages 1025–1032.
Shapley, L. S. (1953). Stochastic games. Proceedings of the
National Academy of Sciences, 39(10):1095–1100.
Smallwood, R. D. and Sondik, E. J. (1973). The optimal
control of partially observable Markov processes over
a finite horizon. Operations Research, 21(5):1071–
1088.
Smith, T. and Simmons, R. (2012). Point-based POMDP
algorithms: Improved analysis and implementation.
arXiv preprint arXiv:1207.1412.
Vanderbei, R. J. (2014). Linear programming. Springer.
Vidal, R., Shakernia, O., Kim, H. J., Shim, D. H., and Sas-
try, S. (2002). Probabilistic pursuit-evasion games:
theory, implementation, and experimental evaluation.
Robotics and Automation, IEEE Transactions on,
18(5):662–669.
ICAART 2017 - 9th International Conference on Agents and Artificial Intelligence
510