Thompson Sampling on Asymmetric a-stable Bandits
Zhendong Shi, Ercan Kuruoglu, Xiaoli Wei
2023
Abstract
In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can be designed to realize the dynamic balance between exploration and exploitation by changing the reward distribution. Thompson Sampling has been proposed in the literature for the solution of the multi-armed bandit problem by sampling rewards from posterior distributions. Recently, it was used to process non-Gaussian data with heavy tailed distributions. It is a common observation that various real-life data such as social network data and financial data demonstrate not only impulsive but also asymmetric characteristics. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to an asymmetric a-stable distribution with unknown parameters and explore their applications in modelling financial and recommendation system data.
DownloadPaper Citation
in Harvard Style
Shi Z., Kuruoglu E. and Wei X. (2023). Thompson Sampling on Asymmetric a-stable Bandits. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART, ISBN 978-989-758-623-1, pages 434-441. DOI: 10.5220/0011684200003393
in Bibtex Style
@conference{icaart23,
author={Zhendong Shi and Ercan Kuruoglu and Xiaoli Wei},
title={Thompson Sampling on Asymmetric a-stable Bandits},
booktitle={Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,},
year={2023},
pages={434-441},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011684200003393},
isbn={978-989-758-623-1},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 15th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART,
TI - Thompson Sampling on Asymmetric a-stable Bandits
SN - 978-989-758-623-1
AU - Shi Z.
AU - Kuruoglu E.
AU - Wei X.
PY - 2023
SP - 434
EP - 441
DO - 10.5220/0011684200003393