THE PARALLELIZATION OF MONTE-CARLO PLANNING - Parallelization of MC-Planning

S. Gelly; J. B. Hoock; A. Rimmel; O. Teytaud; Y. Kalemkarian

doi:10.5220/0001502402440249

THE PARALLELIZATION OF MONTE-CARLO PLANNING - Parallelization of MC-Planning

S. Gelly, J. B. Hoock, A. Rimmel, O. Teytaud, Y. Kalemkarian

2008

Abstract

Since their impressive successes in various areas of large-scale parallelization, recent techniques like UCT and other Monte-Carlo planning variants (Kocsis and Szepesvari, 2006a) have been extensively studied (Coquelin and Munos, 2007; Wang and Gelly, 2007). We here propose and compare various forms of parallelization of bandit-based tree-search, in particular for our computer-go algorithm XYZ.

References

Agrawal, R. (1995). The continuum-armed bandit problem. SIAM J. Control Optim., 33(6):1926-1951.
Auer, P., Cesa-Bianchi, N., and Gentile, C. (2001). Adaptive and self-confident on-line learning algorithms. Machine Learning Journal.
Banks, J. S. and Sundaram, R. K. (1992). Denumerablearmed bandits. Econometrica, 60(5):1071-96. Available at http://ideas.repec.org/a/ecm/emetrp/ v60y1992i5p1071-96.html.
Barto, A., Bradtke, S., and Singh, S. (1993). Learning to act using real-time dynamic programming. Technical Report UM-CS-1993-002.
Bellman, R. (1957). Dynamic Programming. Princeton Univ. Press.
Berry, D. A., Chen, R. W., Zame, A., Heath, D. C., and Shepp, L. A. (1997). Bandit problems with infinitely many arms. Ann. Statist., 25(5):2103-2116.
Bertsekas, D. (1995). Dynamic Programming and Optimal Control, vols I and II. Athena Scientific.
Bruegmann, B. (1993). Monte carlo go. Unpublished.
Cazenave, T. and Helmstetter, B. (2005). Combining tactical search and monte-carlo in the game of go. IEEE CIG 2005, pages 171-175.
Coquelin, P.-A. and Munos, R. (2007). Bandit algorithms for tree search. In Proceedings of UAI'07.
Coulom, R. (2006). Efficient selectivity and backup operators in monte-carlo tree search. In P. Ciancarini and H. J. van den Herik, editors, Proceedings of the 5th International Conference on Computers and Games, Turin, Italy.
Coulom, R. (2007). Computing elo ratings of move patterns in the game of go. In van den Herik, H. J., Uiterwijk, J. W. H. M., Winands, M., and Schadd, M., editors, Computer Games Workshop, Amsterdam.
Dani, V. and Hayes, T. P. (2006). Robbing the bandit: less regret in online geometric optimization against an adaptive adversary. In SODA 7806: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pages 937-943, New York, NY, USA. ACM Press.
Gelly, S. and Silver, D. (2007). Combining online and offline knowledge in uct. In ICML 7807: Proceedings of the 24th international conference on Machine learning, pages 273-280, New York, NY, USA. ACM Press.
Hussain, Z., Auer, P., Cesa-Bianchi, N., Newnham, L., and Shawe-Taylor, J. (2006). Exploration vs. exploitation challenge. Pascal Network of Excellence.
Kocsis, L. and Szepesvari, C. (2005). Reduced-variance payoff estimation in adversarial bandit problems. In Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments.
Kocsis, L. and Szepesvari, C. (2006a). Bandit-based montecarlo planning. ECML'06.
Kocsis, L. and Szepesvari, C. (2006b). Discounted-ucb. In 2nd Pascal-Challenge Workshop.
Lai, T. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22.
Powell, W.-B. (2007). Approximate Dynamic Programming. Wiley.
Wang, Y. and Gelly, S. (2007). Modifications of UCT and sequence-like simulations for Monte-Carlo Go. In IEEE Symposium on Computational Intelligence and Games, Honolulu, Hawaii, pages 175-182.

Download

Paper Citation

in Harvard Style

Gelly S., B. Hoock J., Rimmel A., Teytaud O. and Kalemkarian Y. (2008). THE PARALLELIZATION OF MONTE-CARLO PLANNING - Parallelization of MC-Planning . In Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-8111-30-2, pages 244-249. DOI: 10.5220/0001502402440249

in Bibtex Style

@conference{icinco08,
author={S. Gelly and J. B. Hoock and A. Rimmel and O. Teytaud and Y. Kalemkarian},
title={THE PARALLELIZATION OF MONTE-CARLO PLANNING - Parallelization of MC-Planning},
booktitle={Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2008},
pages={244-249},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001502402440249},
isbn={978-989-8111-30-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - THE PARALLELIZATION OF MONTE-CARLO PLANNING - Parallelization of MC-Planning
SN - 978-989-8111-30-2
AU - Gelly S.
AU - B. Hoock J.
AU - Rimmel A.
AU - Teytaud O.
AU - Kalemkarian Y.
PY - 2008
SP - 244
EP - 249
DO - 10.5220/0001502402440249