ization function is simple and intuitive but can not
find all the optimal arms in a non-convex Pareto front
set. In opposition, Chebyshev scalarization function
has an extra parameter to be tuned, however can find
all the optimal arms in a non-convex Pareto front
set. Recently, (Drugan and Nowe, 2013) have used
a multi-objective version of the Upper Confidence
Bound (UCB1) policy to find the Pareto optimal arm
set (exploring) and select fairly the optimal arms (ex-
ploiting), i.e. solve the trade-off problem in the Multi-
Objective, Multi-Armed Bandits (MOMABs) prob-
lem. We compare KG policy and UCB1 on the
MOMABs problem.
The rest of the paper is organized as follows. In
Section 2 we present background information on the
algorithms and the used notation. In Section 3 we in-
troduce multi-objective, multi-armed bandits frame-
work and upper confidence bound policy UCB1 in
multi-objective normal distributions bandits. In Sec-
tion 4 we introduce knowldge gradient (KG) pol-
icy and we propose Pareto knowldge gradient algo-
rithm, linear scalarized knowledge gradient across
arms algorithm, linear scalarized knowledge gradient
across dimensions algorithm, and Chebyshev scalar-
ized knowledge gradient algorithm. In Section 5 we
present scalarized multi-objective bandits. In Sec-
tion 6, we describe the experiments set up followed
by experimental results. Finally, we conclude and dis-
cuss future work.
2 BACKGROUND
In this section, we introduce the Pareto partial or-
der relationship, order relationships for scalarization
functions and regret performance measures of the
multi-objective, multi-armed bandits problem.
Let us consider the multi-objective, multi-armed
bandits (MOMABs) problem with |A|,|A| ≥ 2 arms
and with D objectives(or dimensions). Each objective
has a specific value and the objectives are conflicting
with each other. This means that the value of arm i can
be better than the value of arm j in one dimension and
worse than the value of arm j in other dimension.
2.1 The Pareto Partial Order
Relationship
Pareto partial order finds the Pareto optimal arm set
directly in the multi-objectivespace (Zitzler and et al.,
2002). Pareto partial order uses the following rela-
tionships between the mean vectors of two arms. We
use i and j to refer to the mean vector (estimated mean
vector or true mean vector) of arms i and j, respec-
tively:
1. Arm i dominates or is better than j, i ≻ j, if there
exists at least one dimension d for which i
d
≻ j
d
and for all other dimensions o we have i
o
j
o
.
2. Arm i weakly-dominates j, i j, if and only if
for all dimensions d, i.e. d = 1,··· ,D we have
i
d
j
d
.
3. Arm i is incomparable with j, i k j, if and only
if there exists at least one dimension d for which
i
d
≻ j
d
and there exists another dimension o for
which i
o
≺ j
o
.
4. Arm i is not dominated by j, j ⊁ i, if and only
if there exists at least one dimension d for which
j
d
≺ i
d
. This means that either i ≻ j or i k j.
Using the above relationships, the Pareto optimal arm
A
∗
set, A
∗
⊂ A be the set of arms that are not domi-
nated by all other arms. Then:
∀
a
∗
∈ A
∗
, and ∀
o
/∈ A
∗
(∀
o
∈ A), we have o ⊁ a
∗
Moreover, the Pareto optimal arms A
∗
are incom-
parable with each other. Then:
∀
a
∗
,b
∗
∈ A
∗
, we have a
∗
k b
∗
2.2 The Scalarized Functions Partial
Order Relationships
In general, scalarization functions convert the multi-
objective into single-objective optimization (Eich-
felder, 2008). However, solving a multi-objective op-
timization problem means finding the Pareto front set.
Thus, we need a set of scalarized functions S to gener-
ate a variety of elements belonging to the Pareto front
set. There are two types of scalarization functions that
weigh the mean vector, linear and non-linear (Cheby-
shev) scalarization functions.
The linear scalarization assigns to each value of
the mean vector of an arm i a weight w
d
and the result
is the sum of these weighted mean values. The linear
scalarized across mean vector is:
f
j
(µ
i
) = w
1
µ
1
i
+ ···+ w
D
µ
D
i
(2)
where (w
1
,··· ,w
D
) is a set of predefined weights
for the linear scalarized function j, j ∈ S, such that
∑
D
d=1
w
d
= 1 and µ
i
is the mean vector of arm i. The
linear scalarization is very popular because of its sim-
plicity. However, it can not find all the arms in the
Pareto optimal set A
∗
if the corresponding mean set is
a non-convex set.
The Chebyshev scalarization beside weights,
Chebyshev scalarization has a D-dimensional refer-
ence point, i.e. z = [z
1
,··· ,z
D
]
T
. The Chebyshev
KnowledgeGradientforMulti-objectiveMulti-armedBanditAlgorithms
75