The results by all algorith ms co mpared in this
section provide valuable insights to improve the Q-
learning approach. They sug gest that incorporating
some of the functionality of the moRBC climber into
the transitions allowed for the Q-learning approach
could improve its effectiveness. In addition, ways
to include selection princip les that are more robust in
objective spaces of larger d imensions should be con-
sidered. This implies different ways to compute the
rewa rds and the selection of the solution to restart a n
episode.
5 CONCLUSION
In th is work, we studied distributed and centralized
approa c hes of Q-learning for multi-objective opti-
mization of binary epistatic problem s using MNK-
landscapes. We showed th at the Q-learning based
approa c hes scale up better than moRBC, NSGA-II
and MOEA/D as we increase the number of objec-
tives on problems with large epistasis. Also, we iden-
tified their wea knesses particularly in low epistatic
landscapes. In addition, we analyzed r e sults of other
MOEAs taking into account their selection method
and operators of variation together with properties of
MNK-landscapes to better understand the Q-learning
based approaches and suggested forms to improve
them. Our conclusions regarding the parameters of
the Q-learning based approach e s are as follows. The
action that flips any bit is overall slightly sup e rior to
the action that flips the left or right neighboring bits.
The centralized approach, using a reward based on
Pareto dominance, does not scale up well with the di-
mension of the ob je ctive space.
In the f uture, we would like to study other ways to
assign rewards for the centralized approach, enhance
the selection of solutions for the initial state of an
episode, an d constrain t transitions to non-improving
states. We would also like to study the Q-learning ap-
proach e s for many-objective optimization and analize
the optimization history obtained by Q-learning.
REFERENCES
Aguirre, H. and Tanaka, K. (2005). Random Bit
Climbers on Multiobjective MNK-L andscapes: Ef-
fects of Memory and Population Climbing. IEICE
Transactions, 88-A:334–345.
Aguirre, H. and Tanaka, K. ( 2007). Working Principles,
Behavior, and P erf ormance of MOEAs on MNK-
landscapes. European Journal of Operational Re-
search, 181:1670–1690.
Barrett, L. and Narayanan, S. (2008). Learning All Optimal
Policies with Multiple Criteria. International Confer-
ence on International Conference on Machine Learn-
ing, pages 41–47.
Deb, K. (2001). Multi-Objective Optimization using Evolu-
tionary Algorithms. John Wiley & Sons.
Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002).
A Fast and Elitist Multiobjective Genetic Algorithm:
NSGA-II. IE EE Transactions on Evolutionary Com-
putation, 6(2):182–197.
Drugan, M. (2019). Reinforcement Learning Versus E vo-
lutionary Computation: A Survey on Hybrid Algo-
rithms. Swarm Evol. Comput., 44:228–246.
G´abor, Z., Kalm´ar, Z., and Szepesv´ari, C. (1998). Multi-
Criteria Reinforcement Learning. I nternational Con-
ference on International Conference on Machine
Learning, 98:197–205.
Hayes, C., R˘adulescu, R., Bargiacchi, E., K¨allstr¨om, J.,
Macfarlane, M., Reymond, M., Verstraeten, T., and
et al (2022). A Practical Guide to Multi-Objective
Reinforcement Learning and Planning. Autonomous
Agents and Multi-Agent Systems, 32(1):26.
Jalalimanesh, A., Haghighi, H. S., Ahmadi, A., Hejazian,
H., and Soltani, M. (2017). Multi-Objective Op-
timization of Radiotherapy: Distributed Q-Learning
and Agent-Based Simulation. Journal of Experimen-
tal & Theoretical Artificial Intelligence, 29(5):1071–
86.
Lizotte, D., Bow ling, M., and Murphy, S. (2010). Effi-
cient Reinforcement Learning with Multiple Reward
Functions for Randomized Controlled Trial Analysis.
International Conference on International Conference
on Machine Learning (ICML), 10:695–702.
Mariano, C. and Morales, E. (2000). Distr ibuted Reinforce-
ment Learning for Multiple Objective Optimizati on
Problems. In Proc. of Congress on Evolutionary Com-
putation (CEC-2000), pages 188–195.
Moffaert, K. V., Drugan, M., and Now´e, A. (2013a).
Hypervolume-Based Multi-Objective Reinforcement
Learning. Evolutionary Multi-Criterion Optimization,
pages 352–66.
Moffaert, K. V., Drugan, M., and Now´e, A. (2013b). Scalar-
ized Multi-Objective Reinforcement Learning: Novel
Design Techniques. IEEE Symposium on Adaptive
Dynamic Programming and Reinforcement Learning
(ADPRL), pages 191–99.
Shen, X., Minku, L., Marturi, N., Guo, Y., and Han, Y.
(2018). A Q-Learning-Based Memetic Algorithm for
Multi-Objective Dynamic Software Project Schedul-
ing. Information Sciences, 428:1–29.
Sutton, R. and Brato, A. (1998). Reinforcement Learning.
The MIT Press.
Watkins, C. and Dayan, P. (1992). Q- learning. Machine
Learning, 8:279–292.
Zhang, Q . and Li, H. (2008). MOEA/D: A Multiobjec-
tive Evolutionary Algorithm Based on Decomposi-
tion. IEE E Trans. Neural Netw., 11(6):712–731.
Zitzler, E. (1999). Evolutionary Algorithms for Multiobjec-
tive Optimization: Methods and Applications. PhD
thesis, ETH Zurich, Switzerland.