ments once they are cumulated.
• Comparing Recommendation Techniques:
Most Played Arm is Better. The empirically
best arm and the most played arm in UCB are
usually the same (this is not the case for various
other bandit algorithms), and are much better
than the “empirical distribution of play” tech-
nique. The most played arm and the empirical
distribution of play obviously do not make sense
for Uniform. Please note that it is known in
other settings (see (Wang and Gelly, 2007)) that
the most played arm is better(Wang and Gelly,
2007). MPA is seemingly a reliable tool in many
settings.
A next experimental step is the automatic use of the
algorithm for more parameters, or e.g. by extending
automatically the neural network used in the Monte-
Carlo Tree Search so that it takes into account more
inputs: instead of performing one big modification,
apply several modifications the one after the other,
and tune them sequentially so that all the modifica-
tions can be visualized and checked independently.
The fact that the small constant 0.1 was better in UCB
is consistant with the known fact that tuned version of
UCB (with p related to the variance) provides better
results; using tuned-UCB might provide further im-
provements(Audibert et al., 2006).
ACKNOWLEDGEMENTS
This work has been supported by French National Re-
search Agency (ANR) through COSINUS program
(project EXPLO-RA No ANR-08-COSI-004), and
grant No. ANR-08-COSI-007-12 (OMD project). It
benefited from the help of Grid5000 for parallel ex-
periments.
REFERENCES
Audibert, J.-Y., Munos, R., and Szepesvari, C. (2006). Use
of variance estimation in the multi-armed bandit prob-
lem. In NIPS 2006 Workshop on On-line Trading of
Exploration and Exploitation.
Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite
time analysis of the multiarmed bandit problem. Ma-
chine Learning, 47(2/3):235–256.
Bernstein, S. (1924). On a modification of chebyshev’s in-
equality and of the error formula of laplace. Original
publication: Ann. Sci. Inst. Sav. Ukraine, Sect. Math.
1, 3(1):38–49.
Bubeck, S., Munos, R., and Stoltz, G. (2009). Pure explo-
ration in multi-armed bandits problems. In ALT, pages
23–37.
Chaslot, G., Hoock, J.-B., Teytaud, F., and Teytaud, O.
(2009). On the huge benefit of quasi-random mu-
tations for multimodal optimization with application
to grid-based tuning of neurocontrollers. In ESANN,
Bruges Belgium.
Chaslot, G., Saito, J.-T., Bouzy, B., Uiterwijk, J. W. H. M.,
and van den Herik, H. J. (2006). Monte-Carlo Strate-
gies for Computer Go. In Schobbens, P.-Y., Vanhoof,
W., and Schwanen, G., editors, Proceedings of the
18th BeNeLux Conference on Artificial Intelligence,
Namur, Belgium, pages 83–91.
Coulom, R. (2006). Efficient selectivity and backup opera-
tors in monte-carlo tree search. In P. Ciancarini and
H. J. van den Herik, editors, Proceedings of the 5th
International Conference on Computers and Games,
Turin, Italy.
Holm, S. (1979). A simple sequentially rejective multiple
test procedure. scand. j. statistic., 6:65-70.
Hoock, J.-B. and Teytaud, O. (2010). Bandit-based genetic
programming. In Accepted in EuroGP 2010, LLNCS.
Springer.
Hsu, J. (1996). Multiple comparisons, theory and methods,
chapman & hall/crc.
Kocsis, L. and Szepesvari, C. (2006). Bandit based monte-
carlo planning. In 15th European Conference on Ma-
chine Learning (ECML), pages 282–293.
Lee, C.-S., Wang, M.-H., Chaslot, G., Hoock, J.-B., Rim-
mel, A., Teytaud, O., Tsai, S.-R., Hsu, S.-C., and
Hong, T.-P. (2009). The Computational Intelligence
of MoGo Revealed in Taiwan’s Computer Go Tourna-
ments. IEEE Transactions on Computational Intelli-
gence and AI in games.
Mnih, V., Szepesv´ari, C., and Audibert, J.-Y. (2008). Empir-
ical Bernstein stopping. In ICML ’08: Proceedings of
the 25th international conference on Machine learn-
ing, pages 672–679, New York, NY, USA. ACM.
Nannen, V. and Eiben, A. E. (2007a). Relevance estima-
tion and value calibration of evolutionary algorithm
par ameters. In International Joint Conference on Ar-
tificial Intelligence (IJCAI’07), pages 975–980.
Nannen, V. and Eiben, A. E. (2007b). Variance reduction
in meta-eda. In GECCO ’07: Proceedings of the 9th
annual conference on Genetic and evolutionary com-
putation, pages 627–627, New York, NY, USA. ACM.
Pantazis, D., Nichols, T. E., Baillet, S., and Leahy, R.
(2005). A comparison of random field theory and per-
mutation methods for the statistical analysis of MEG
data. Neuroimage, 25:355–368.
Wang, Y. and Gelly, S. (2007). Modifications of UCT and
sequence-like simulations for Monte-Carlo Go. In
IEEE Symposium on Computational Intelligence and
Games, Honolulu, Hawaii, pages 175–182.
Wolpert, D. and Macready, W. (1997). No Free Lunch The-
orems for Optimization. IEEE Transactions on Evo-
lutionary Computation, 1(1):67–82.
PARAMETER TUNING BY SIMPLE REGRET ALGORITHMS AND MULTIPLE SIMULTANEOUS HYPOTHESIS
TESTING
173