PARAMETER TUNING BY SIMPLE REGRET ALGORITHMS AND MULTIPLE SIMULTANEOUS HYPOTHESIS TESTING

Amine Bourki, Matthieu Coulm, Philippe Rolet, Olivier Teytaud, Paul Vayssière

Abstract

“Simple regret” algorithms are designed for noisy optimization in unstructured domains. In particular, this literature has shown that the uniform algorithm is indeed optimal asymptotically and suboptimal nonasymptotically. We investigate theoretically and experimentally the application of these algorithms, for automatic parameter tuning, in particular from the point of view of the number of samples required for “uniform” to be relevant and from the point of view of statistical guarantees. We see that for moderate numbers of arms, the possible improvement in terms of computational power required for statistical validation can’t be more than linear as a function of the number of arms and provide a simple rule to check if the simple uniform algorithm (trivially parallel) is relevant. Our experiments are performed on the tuning of a Monte-Carlo Tree Search algorithm, a great recent tool for high-dimensional planning with particularly impressive results for difficult games and in particular the game of Go.

References

  1. Audibert, J.-Y., Munos, R., and Szepesvari, C. (2006). Use of variance estimation in the multi-armed bandit problem. In NIPS 2006 Workshop on On-line Trading of Exploration and Exploitation.
  2. Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2/3):235-256.
  3. Bernstein, S. (1924). On a modification of chebyshev's inequality and of the error formula of laplace. Original publication: Ann. Sci. Inst. Sav. Ukraine, Sect. Math. 1, 3(1):38-49.
  4. Bubeck, S., Munos, R., and Stoltz, G. (2009). Pure exploration in multi-armed bandits problems. In ALT, pages 23-37.
  5. Chaslot, G., Hoock, J.-B., Teytaud, F., and Teytaud, O. (2009). On the huge benefit of quasi-random mutations for multimodal optimization with application to grid-based tuning of neurocontrollers. In ESANN, Bruges Belgium.
  6. Chaslot, G., Saito, J.-T., Bouzy, B., Uiterwijk, J. W. H. M., and van den Herik, H. J. (2006). Monte-Carlo Strategies for Computer Go. In Schobbens, P.-Y., Vanhoof, W., and Schwanen, G., editors, Proceedings of the 18th BeNeLux Conference on Artificial Intelligence, Namur, Belgium, pages 83-91.
  7. Coulom, R. (2006). Efficient selectivity and backup operators in monte-carlo tree search. In P. Ciancarini and H. J. van den Herik, editors, Proceedings of the 5th International Conference on Computers and Games, Turin, Italy.
  8. Holm, S. (1979). A simple sequentially rejective multiple test procedure. scand. j. statistic., 6:65-70.
  9. Hoock, J.-B. and Teytaud, O. (2010). Bandit-based genetic programming. In Accepted in EuroGP 2010, LLNCS. Springer.
  10. Hsu, J. (1996). Multiple comparisons, theory and methods, chapman & hall/crc.
  11. Kocsis, L. and Szepesvari, C. (2006). Bandit based montecarlo planning. In 15th European Conference on Machine Learning (ECML), pages 282-293.
  12. Lee, C.-S., Wang, M.-H., Chaslot, G., Hoock, J.-B., Rimmel, A., Teytaud, O., Tsai, S.-R., Hsu, S.-C., and Hong, T.-P. (2009). The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments. IEEE Transactions on Computational Intelligence and AI in games.
  13. Mnih, V., Szepesvári, C., and Audibert, J.-Y. (2008). Empirical Bernstein stopping. In ICML 7808: Proceedings of the 25th international conference on Machine learning, pages 672-679, New York, NY, USA. ACM.
  14. Nannen, V. and Eiben, A. E. (2007a). Relevance estimation and value calibration of evolutionary algorithm par ameters. In International Joint Conference on Artificial Intelligence (IJCAI'07), pages 975-980.
  15. Nannen, V. and Eiben, A. E. (2007b). Variance reduction in meta-eda. In GECCO 7807: Proceedings of the 9th annual conference on Genetic and evolutionary computation, pages 627-627, New York, NY, USA. ACM.
  16. Pantazis, D., Nichols, T. E., Baillet, S., and Leahy, R. (2005). A comparison of random field theory and permutation methods for the statistical analysis of MEG data. Neuroimage, 25:355-368.
  17. Wang, Y. and Gelly, S. (2007). Modifications of UCT and sequence-like simulations for Monte-Carlo Go. In IEEE Symposium on Computational Intelligence and Games, Honolulu, Hawaii, pages 175-182.
  18. Wolpert, D. and Macready, W. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67-82.
Download


Paper Citation


in Harvard Style

Bourki A., Coulm M., Rolet P., Teytaud O. and Vayssière P. (2010). PARAMETER TUNING BY SIMPLE REGRET ALGORITHMS AND MULTIPLE SIMULTANEOUS HYPOTHESIS TESTING . In Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, ISBN 978-989-8425-00-3, pages 169-173. DOI: 10.5220/0002949901690173


in Bibtex Style

@conference{icinco10,
author={Amine Bourki and Matthieu Coulm and Philippe Rolet and Olivier Teytaud and Paul Vayssière},
title={PARAMETER TUNING BY SIMPLE REGRET ALGORITHMS AND MULTIPLE SIMULTANEOUS HYPOTHESIS TESTING},
booktitle={Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,},
year={2010},
pages={169-173},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002949901690173},
isbn={978-989-8425-00-3},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 7th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO,
TI - PARAMETER TUNING BY SIMPLE REGRET ALGORITHMS AND MULTIPLE SIMULTANEOUS HYPOTHESIS TESTING
SN - 978-989-8425-00-3
AU - Bourki A.
AU - Coulm M.
AU - Rolet P.
AU - Teytaud O.
AU - Vayssière P.
PY - 2010
SP - 169
EP - 173
DO - 10.5220/0002949901690173