A GENERIC SOLUTION TO MULTI-ARMED BERNOULLI BANDIT PROBLEMS BASED ON RANDOM SAMPLING FROM SIBLING CONJUGATE PRIORS

Thomas Norheim, Terje Brådland, Ole-Christoffer Granmo, B. John Oommen

2010

Abstract

The Multi-Armed Bernoulli Bandit (MABB) problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Although poised in an abstract framework, the applications of the MABB are numerous (Gelly and Wang, 2006; Kocsis and Szepesvari, 2006; Granmo et al., 2007; Granmo and Bouhmala, 2007) . On the other hand, while Bayesian methods are generally computationally intractable, they have been shown to provide a standard for optimal decision making. This paper proposes a novel MABB solution scheme that is inherently Bayesian in nature, and which yet avoids the computational intractability by relying simply on updating the hyper-parameters of the sibling conjugate distributions, and on simultaneously sampling randomly from the respective posteriors. Although, in principle, our solution is generic, to be concise, we present here the strategy for Bernoulli distributed rewards. Extensive experiments demonstrate that our scheme outperforms recently proposed bandit playing algorithms. We thus believe that our methodology opens avenues for obtaining improved novel solutions.

Download


Paper Citation


in Harvard Style

Norheim T., Brådland T., Granmo O. and John Oommen B. (2010). A GENERIC SOLUTION TO MULTI-ARMED BERNOULLI BANDIT PROBLEMS BASED ON RANDOM SAMPLING FROM SIBLING CONJUGATE PRIORS . In Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-674-021-4, pages 36-44. DOI: 10.5220/0002712500360044


in Bibtex Style

@conference{icaart10,
author={Thomas Norheim and Terje Brådland and Ole-Christoffer Granmo and B. John Oommen},
title={A GENERIC SOLUTION TO MULTI-ARMED BERNOULLI BANDIT PROBLEMS BASED ON RANDOM SAMPLING FROM SIBLING CONJUGATE PRIORS},
booktitle={Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2010},
pages={36-44},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002712500360044},
isbn={978-989-674-021-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 2nd International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - A GENERIC SOLUTION TO MULTI-ARMED BERNOULLI BANDIT PROBLEMS BASED ON RANDOM SAMPLING FROM SIBLING CONJUGATE PRIORS
SN - 978-989-674-021-4
AU - Norheim T.
AU - Brådland T.
AU - Granmo O.
AU - John Oommen B.
PY - 2010
SP - 36
EP - 44
DO - 10.5220/0002712500360044