Dealing With Groups of Actions in Multiagent Markov Decision Processes

Guillaume Debras, Abdel-Illah Mouaddib, Laurent Jean Pierre, Simon Le Gloannec

Abstract

Multiagent Markov Decision Processes (MMDPs) provide a useful framework for multiagent decision making. Finding solutions to large-scale problems or with a large number of agents however, has been proven to be computationally hard. In this paper, we adapt H-(PO)MDPs to multi-agent settings by proposing a new approach using action groups to decompose an initial MMDP into a set of dependent Sub-MMDPs where each action group is assigned a corresponding Sub-MMDP. Sub-MMDPs are then solved using a parallel Bellman backup to derive local policies which are synchronized by propagating local results and updating the value functions locally and globally to take the dependencies into account. This decomposition allows, for example, specific aggregation for each sub-MMDP, which we adapt by using a novel value function update. Experimental evaluations have been developed and applied to real robotic platforms showing promising results and validating our techniques.

References

  1. Becker, R., Zilberstein, S., Lesser, V., and Goldman, C. V. (2004). Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research, pages 423-455.
  2. Bellman, R. (1954). The theory of dynamic programming. Bull. Amer. Math. Soc. 60, no. 6, pages 503-515.
  3. Bernstein, D. S., Zilberstein, S., and Immerman, N. (2000). The complexity of decentralized control of markov decision processes. In Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence, pages 32-37. Morgan Kaufmann Publishers Inc.
  4. Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pages 195-210. Morgan Kaufmann Publishers Inc.
  5. Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In IJCAI, volume 99, pages 478-485.
  6. Claes, D., Robbel, P., Oliehoek, F. A., Tuyls, K., Hennes, D., and van der Hoek, W. (2015). Effective approximations for multi-robot coordination in spatially distributed tasks. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 7815, pages 881-890, Richland, SC. International Foundation for Autonomous Agents and Multiagent Systems.
  7. Goldman, C. V. and Zilberstein, S. (2004). Decentralized control of cooperative systems: Categorization and complexity analysis. J. Artif. Intell. Res.(JAIR), 22:143-174.
  8. Guestrin, C., Venkataraman, S., and Koller, D. (2002). Context-specific multiagent coordination and planning with factored MDPs. In AAAI/IAAI, pages 253- 259.
  9. Labbe, M. and Michaud, F. (2014). Online Global Loop Closure Detection for Large-Scale Multi-Session Graph-Based SLAM. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2661-2666.
  10. Littman, M., Dean, T., and Kaelbling, L. P. (1995). On the complexity of solving markov decision problems. In Uncertainty in Artificial Intelligence. Proceedings of the 11th Conference, pages 394-402.
  11. Matignon, L., Jeanpierre, L., and Mouaddib, A.-I. (2012). Coordinated multi-robot exploration under communication constraints using decentralized markov decision processes. In AAAI, pages 2017-2023.
  12. Melo, F. S. and Veloso, M. (2009). Learning of coordination: Exploiting sparse interactions in multiagent systems. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 2, pages 773-780. International Foundation for Autonomous Agents and Multiagent Systems.
  13. Messias, J. V., Spaan, M. T., and Lima, P. U. (2013). GSMDPs for Multi-Robot Sequential DecisionMaking. In AAAI.
  14. Papadimitriou, C. H. and Tsitsiklis, J. N. (1987). The complexity of markov decision processes. In Mathematics of Operations Research 12(3), pages 441-450.
  15. Parr, R. (1998). Flexible decomposition algorithms for weakly coupled Markov decision problems. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pages 422-430. Morgan Kaufmann Publishers Inc.
  16. Pineau, J., Roy, N., and Thrun, S. (2001). A hierarchical approach to POMDP planning and execution. Workshop on hierarchy and memory in reinforcement learning, 65(66):51-55.
  17. Quigley, M., Conley, K., Gerkey, B. P., Faust, J., Foote, T., Leibs, J., Wheeler, R., and Ng, A. Y. (2009). ROS: An open-source robot operating system. In ICRA Workshop on Open Source Software.
  18. Witwicki, S. J. and Durfee, E. H. (2010). Influence-based policy abstraction for weakly-coupled dec-POMDPs. In ICAPS, pages 185-192.
  19. Xuan, P., Lesser, V., and Zilberstein, S. (2001). Communication decisions in multi-agent cooperation: Model and experiments. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 616-623.
Download


Paper Citation


in Harvard Style

Debras G., Mouaddib A., Jean Pierre L. and Le Gloannec S. (2016). Dealing With Groups of Actions in Multiagent Markov Decision Processes . In Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2016) ISBN 978-989-758-201-1, pages 49-58. DOI: 10.5220/0006048000490058


in Bibtex Style

@conference{ecta16,
author={Guillaume Debras and Abdel-Illah Mouaddib and Laurent Jean Pierre and Simon Le Gloannec},
title={Dealing With Groups of Actions in Multiagent Markov Decision Processes},
booktitle={Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2016)},
year={2016},
pages={49-58},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006048000490058},
isbn={978-989-758-201-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 8th International Joint Conference on Computational Intelligence - Volume 1: ECTA, (IJCCI 2016)
TI - Dealing With Groups of Actions in Multiagent Markov Decision Processes
SN - 978-989-758-201-1
AU - Debras G.
AU - Mouaddib A.
AU - Jean Pierre L.
AU - Le Gloannec S.
PY - 2016
SP - 49
EP - 58
DO - 10.5220/0006048000490058