Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

Karel Horák; Branislav Bošanský

doi:10.5220/0006190605030510

Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games

Karel Horák, Branislav Bošanský

2017

Abstract

Pursuit-evasion scenarios appear widely in robotics, security domains, and many other real-world situations. We focus on two-player pursuit-evasion games with concurrent moves, infinite horizon, and discounted rewards. We assume that the players have partial observability, however, the evader has an advantage of knowing the current position of pursuer’s units. This setting is particularly interesting for security domains where a robust strategy, maximizing the utility in the worst-case scenario, is often desirable. We provide, to the best of our knowledge, the first algorithm that provably converges to the value of a partially observable pursuit-evasion game with infinite horizon. Our algorithm extends well-known value iteration algorithm by exploiting that (1) value functions of our game depend only on the position of the pursuer and the belief he has about the position of the evader, and (2) that these functions are piecewise linear and convex in the belief space.

References

Chung, T. H., Hollinger, G. A., and Isler, V. (2011). Search and pursuit-evasion in mobile robotics. Autonomous Robots, 31(4):299-316.
Ciesielski, K. et al. (2007). On Stefan Banach and some of his results. Banach Journal of Mathematical Analysis, 1(1):1-10.
Hansen, E. A., Bernstein, D. S., and Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In AAAI, volume 4, pages 709- 715.
Koller, D., Megiddo, N., and von Stengel, B. (1996). Efficient Computation of Equilibria for Extensive TwoPerson Games. Games and Economic Behavior, 14(2):247-259.
McEneaney, W. M. (2004). Some classes of imperfect information finite state-space stochastic games with finitedimensional solutions. Applied Mathematics and Optimization, 50(2):87-118.
Monahan, G. E. (1982). State of the arta survey of partially observable markov decision processes: Theory, models, and algorithms. Management Science, 28(1):1- 16.
Pineau, J., Gordon, G., Thrun, S., et al. (2003). Point-based value iteration: An anytime algorithm for POMDPs. In IJCAI, volume 3, pages 1025-1032.
Shapley, L. S. (1953). Stochastic games. Proceedings of the National Academy of Sciences, 39(10):1095-1100.
Smallwood, R. D. and Sondik, E. J. (1973). The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5):1071- 1088.
Smith, T. and Simmons, R. (2012). Point-based POMDP algorithms: Improved analysis and implementation. arXiv preprint arXiv:1207.1412.
Vanderbei, R. J. (2014). Linear programming. Springer.
Vidal, R., Shakernia, O., Kim, H. J., Shim, D. H., and Sastry, S. (2002). Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. Robotics and Automation, IEEE Transactions on, 18(5):662-669.

Download

Paper Citation

in Harvard Style

Horák K. and Bošanský B. (2017). Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games . In Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART, ISBN 978-989-758-220-2, pages 503-510. DOI: 10.5220/0006190605030510

in Bibtex Style

@conference{icaart17,
author={Karel Horák and Branislav Bošanský},
title={Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games},
booktitle={Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,},
year={2017},
pages={503-510},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006190605030510},
isbn={978-989-758-220-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Agents and Artificial Intelligence - Volume 2: ICAART,
TI - Dynamic Programming for One-sided Partially Observable Pursuit-evasion Games
SN - 978-989-758-220-2
AU - Horák K.
AU - Bošanský B.
PY - 2017
SP - 503
EP - 510
DO - 10.5220/0006190605030510