Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit

Saba Yahyaa; Madalina Drugan; Bernard Manderick

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit

Topics: Evolutionary Computing; Knowledge Representation and Reasoning; Machine Learning

In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART, 55-65, 2015 , Lisbon, Portugal

Authors: Saba Yahyaa ; Madalina Drugan and Bernard Manderick

Affiliation: Vrije Universiteit Brussel, Belgium

Keyword(s): Multi-armed Bandit Problems, Multi-objective Optimization, Linear Scalarized Function, Scalarized Function Set, Thompson Sampling Policy.

Related Ontology Subjects/Areas/Topics: Artificial Intelligence ; Computational Intelligence ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge Representation and Reasoning ; Knowledge-Based Systems ; Machine Learning ; Soft Computing ; Symbolic Systems

Abstract: In the stochastic multi-objective multi-armed bandit (MOMAB), arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward. As a result, there is not only one optimal arm, but there is a set of optimal arms (Pareto front) using Pareto dominance relation. The goal of an agent is to find the Pareto front. To find the optimal arms, the agent can use linear scalarization function that transforms a multi-objective problem into a single problem by summing the weighted objectives. Selecting the weights is crucial, since different weights will result in selecting a different optimum arm from the Pareto front. Usually, a predefined weights set is used and this can be computational inefficient when different weights will optimize the same Pareto optimal arm and arms in the Pareto front are not identified. In this paper, we propose a number of techniques that adapt the weights on the fly in order to ameliorate the performance of the scalarized MOMAB. We use genetic and adaptive scalarization functions from multi-objective optimization to generate new weights. We propose to use Thompson sampling policy to select frequently the weights that identify new arms on the Pareto front. We experimentally show that Thompson sampling improves the performance of the genetic and adaptive scalarization functions. All the proposed techniques improves the performance of the standard scalarized MOMAB with a fixed set of weights. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 216.73.216.207

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Yahyaa, S., Drugan, M., Manderick and B. (2015). Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit. In Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART; ISBN 978-989-758-074-1; ISSN 2184-433X, SciTePress, pages 55-65. DOI: 10.5220/0005184400550065

@conference{icaart15,
author={Saba Yahyaa and Madalina Drugan and Bernard Manderick},
title={Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit},
booktitle={Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART},
year={2015},
pages={55-65},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005184400550065},
isbn={978-989-758-074-1},
issn={2184-433X},
}

TY - CONF

JO - Proceedings of the International Conference on Agents and Artificial Intelligence - Volume 1: ICAART
TI - Thompson Sampling in the Adaptive Linear Scalarized Multi Objective Multi Armed Bandit
SN - 978-989-758-074-1
IS - 2184-433X
AU - Yahyaa, S.
AU - Drugan, M.
AU - Manderick, B.
PY - 2015
SP - 55
EP - 65
DO - 10.5220/0005184400550065
PB - SciTePress