Authors:
Florian Felten
1
;
Grégoire Danoy
2
;
1
;
El-Ghazali Talbi
3
and
Pascal Bouvry
2
;
1
Affiliations:
1
SnT, University of Luxembourg, Esch-sur-Alzette, Luxembourg
;
2
FSTM/DCS, University of Luxembourg, Esch-sur-Alzette, Luxembourg
;
3
University of Lille, CNRS/CRIStAL, Inria Lille, France
Keyword(s):
Reinforcement Learning, Multi-objective, Metaheuristics, Pareto Sets.
Abstract:
The fields of Reinforcement Learning (RL) and Optimization aim at finding an optimal solution to a problem, characterized by an objective function. The exploration-exploitation dilemma (EED) is a well known subject in those fields. Indeed, a consequent amount of literature has already been proposed on the subject and shown it is a non-negligible topic to consider to achieve good performances. Yet, many problems in real life involve the optimization of multiple objectives. Multi-Policy Multi-Objective Reinforcement Learning (MPMORL) offers a way to learn various optimised behaviours for the agent in such problems. This work introduces a modular framework for the learning phase of such algorithms, allowing to ease the study of the EED in Inner-Loop MPMORL algorithms. We present three new exploration strategies inspired from the metaheuristics domain. To assess the performance of our methods on various environments, we use a classical benchmark - the Deep Sea Treasure (DST) - as well as
propose a harder version of it. Our experiments show all of the proposed strategies outperform the current state-of-the-art ε-greedy based methods on the studied benchmarks.
(More)