A Study on Multi-Arm Bandit Problem with UCB and Thompson Sampling Algorithm

Haowen Yang

2024

Abstract

Multi-Armed Bandit (MAB) problem is a sequential decision-making process with wide influence in many fields across medical, and commercial application. In MAB problem, the initial reward distribution was unknown, and observed during the process. In MAB application, Upper Confidence Bound algorithm and Thompson Sampling algorithm are widely used for great performance. This work briefly review the basic concept of MAB problem. Also, this work reviews the formulation of Upper Confidence Bound (UCB) and Thompson Sampling (TS) algorithm. This work shows that UCB algorithm demonstrate a logarithmic relationship. This work also review that TS is a Bayesian method solution of MAB problem. This work carried out a brief test on the cumulative regret on UCB and Thompson sampling algorithm. The testing result shows that TS algorithm was able to generate a lower cumulative regret compared to UCB algorithm under the same scenario. The testing result also show that under a small probability difference and large number of arms TS has similar performance compared to UCB algorithms.

Download


Paper Citation


in Harvard Style

Yang H. (2024). A Study on Multi-Arm Bandit Problem with UCB and Thompson Sampling Algorithm. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 375-379. DOI: 10.5220/0012938400004508


in Bibtex Style

@conference{emiti24,
author={Haowen Yang},
title={A Study on Multi-Arm Bandit Problem with UCB and Thompson Sampling Algorithm},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={375-379},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012938400004508},
isbn={978-989-758-713-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - A Study on Multi-Arm Bandit Problem with UCB and Thompson Sampling Algorithm
SN - 978-989-758-713-9
AU - Yang H.
PY - 2024
SP - 375
EP - 379
DO - 10.5220/0012938400004508
PB - SciTePress