loading
Papers

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Madalina Drugan and Bernard Manderick

Affiliation: Vrije Universiteit Brussel, Belgium

ISBN: 978-989-758-074-1

Keyword(s): Multi-armed Bandits, Multi-objective Optimisation, Pareto Dominance Relation, Infinite Horizon Policies.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Computational Intelligence ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Machine Learning ; Soft Computing ; Symbolic Systems

Abstract: Multi-objective multi-armed bandits (MOMAB) are multi-armed bandits (MAB) extended to reward vectors. We use the Pareto dominance relation to assess the quality of reward vectors, as opposite to scalarization functions. In this paper, we study the exploration vs exploitation trade-off in infinite horizon MOMABs algorithms. Single objective MABs explore the suboptimal arms and exploit a single optimal arm. MOMABs explore the suboptimal arms, but they also need to exploit fairly all optimal arms. We study the exploration vs exploitation trade-off of the Pareto UCB1 algorithm. We extend UCB2 that is another popular infinite horizon MAB algorithm to rewards vectors using the Pareto dominance relation. We analyse the properties of the proposed MOMAB algorithms in terms of upper regret bounds. We experimentally compare the exploration vs exploitation trade-off of the proposed MOMAB algorithms on a bi-objective Bernoulli environment coming from control theory.

PDF ImageFull Text

Download
CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.210.24.208

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Drugan, M. and Manderick, B. (2015). Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms.In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, ISBN 978-989-758-074-1, pages 66-77. DOI: 10.5220/0005195500660077

@conference{icaart15,
author={Madalina Drugan. and Bernard Manderick.},
title={Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,},
year={2015},
pages={66-77},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005195500660077},
isbn={978-989-758-074-1},
}

TY - CONF

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART,
TI - Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms
SN - 978-989-758-074-1
AU - Drugan, M.
AU - Manderick, B.
PY - 2015
SP - 66
EP - 77
DO - 10.5220/0005195500660077

Login or register to post comments.

Comments on this Paper: Be the first to review this paper.