Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms

Madalina Drugan; Bernard Manderick

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms

Topics: Evolutionary Computing; Machine Learning

In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, 66-77, 2015 , Lisbon, Portugal

Authors: Madalina Drugan and Bernard Manderick

Affiliation: Vrije Universiteit Brussel, Belgium

Keyword(s): Multi-armed Bandits, Multi-objective Optimisation, Pareto Dominance Relation, Infinite Horizon Policies.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Computational Intelligence ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge-Based Systems ; Machine Learning ; Soft Computing ; Symbolic Systems

Abstract: Multi-objective multi-armed bandits (MOMAB) are multi-armed bandits (MAB) extended to reward vectors. We use the Pareto dominance relation to assess the quality of reward vectors, as opposite to scalarization functions. In this paper, we study the exploration vs exploitation trade-off in infinite horizon MOMABs algorithms. Single objective MABs explore the suboptimal arms and exploit a single optimal arm. MOMABs explore the suboptimal arms, but they also need to exploit fairly all optimal arms. We study the exploration vs exploitation trade-off of the Pareto UCB1 algorithm. We extend UCB2 that is another popular infinite horizon MAB algorithm to rewards vectors using the Pareto dominance relation. We analyse the properties of the proposed MOMAB algorithms in terms of upper regret bounds. We experimentally compare the exploration vs exploitation trade-off of the proposed MOMAB algorithms on a bi-objective Bernoulli environment coming from control theory.

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.119

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Drugan, M., Manderick and B. (2015). Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms. In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-074-1; ISSN 2184-433X, SciTePress, pages 66-77. DOI: 10.5220/0005195500660077

@conference{icaart15,
author={Madalina Drugan and Bernard Manderick},
title={Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2015},
pages={66-77},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005195500660077},
isbn={978-989-758-074-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Exploration Versus Exploitation Trade-off in Infinite Horizon Pareto Multi-armed Bandits Algorithms
SN - 978-989-758-074-1
IS - 2184-433X
AU - Drugan, M.
AU - Manderick, B.
PY - 2015
SP - 66
EP - 77
DO - 10.5220/0005195500660077
PB - SciTePress