The results by all algorith ms co mpared in this
section provide valuable insights to improve the Q-
learning approach. They sug gest that incorporating
some of the functionality of the moRBC climber into
the transitions allowed for the Q-learning approach
could improve its effectiveness. In addition, ways
to include selection princip les that are more robust in
objective spaces of larger d imensions should be con-
sidered. This implies different ways to compute the
rewa rds and the selection of the solution to restart a n
In th is work, we studied distributed and centralized
approa c hes of Q-learning for multi-objective opti-
mization of binary epistatic problem s using MNK-
landscapes. We showed th at the Q-learning based
approa c hes scale up better than moRBC, NSGA-II
and MOEA/D as we increase the number of objec-
tives on problems with large epistasis. Also, we iden-
tified their wea knesses particularly in low epistatic
landscapes. In addition, we analyzed r e sults of other
MOEAs taking into account their selection method
and operators of variation together with properties of
MNK-landscapes to better understand the Q-learning
based approaches and suggested forms to improve
them. Our conclusions regarding the parameters of
the Q-learning based approach e s are as follows. The
action that flips any bit is overall slightly sup e rior to
the action that flips the left or right neighboring bits.
The centralized approach, using a reward based on
Pareto dominance, does not scale up well with the di-
mension of the ob je ctive space.
In the f uture, we would like to study other ways to
assign rewards for the centralized approach, enhance
the selection of solutions for the initial state of an
episode, an d constrain t transitions to non-improving
states. We would also like to study the Q-learning ap-
proach e s for many-objective optimization and analize
the optimization history obtained by Q-learning.
