Thompson Sampling on Asymmetric a-stable Bandits

Zhendong Shi, Ercan Kuruoglu, Xiaoli Wei

2023

Abstract

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can be designed to realize the dynamic balance between exploration and exploitation by changing the reward distribution. Thompson Sampling has been proposed in the literature for the solution of the multi-armed bandit problem by sampling rewards from posterior distributions. Recently, it was used to process non-Gaussian data with heavy tailed distributions. It is a common observation that various real-life data such as social network data and financial data demonstrate not only impulsive but also asymmetric characteristics. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to an asymmetric a-stable distribution with unknown parameters and explore their applications in modelling financial and recommendation system data.

Download


Paper Citation


in Harvard Style

Shi Z., Kuruoglu E. and Wei X. (2023). Thompson Sampling on Asymmetric a-stable Bandits. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-623-1, pages 434-441. DOI: 10.5220/0011684200003393


in Bibtex Style

@conference{icaart23,
author={Zhendong Shi and Ercan Kuruoglu and Xiaoli Wei},
title={Thompson Sampling on Asymmetric a-stable Bandits},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2023},
pages={434-441},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011684200003393},
isbn={978-989-758-623-1},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Thompson Sampling on Asymmetric a-stable Bandits
SN - 978-989-758-623-1
AU - Shi Z.
AU - Kuruoglu E.
AU - Wei X.
PY - 2023
SP - 434
EP - 441
DO - 10.5220/0011684200003393