Table 2: Hyperparameters for the Deep Sea Environment.
Hyperparameter Value Hyperparameter Value Hyperparameter Value
γ 0.9 α
1 ∗ 10
−3
β 0.4
ε
Q
start
0.95
ε
Q
decay
0.995
ε
Q
min
0.25
ε
W
start
0.99
ε
W
decay
0.9995
ε
W
min
0.01
ζ 0.6
τ
Q
1 ∗ 10
−3
τ
W
1 ∗ 10
−3
Batch size K 1024 Memory size M
1 ∗ 10
5
Q Optimizer RMSprop
W Optimizer RMSprop Q learning rate
1 ∗ 10
−3
W learning rate
1 ∗ 10
−3
solves competition in multi-objective scenarios. We
have demonstrated the proposed method’s efficiency
and superiority to a baseline solution in two environ-
ments: deep sea treasure and multi-objective moun-
tain car. In both of these environments, the proposed
DWN is capable of finding the Pareto front. Fur-
thermore, we have also demonstrated the advantage
of DWN modularity properties by showing that us-
ing a pre-trained policy can aid in finding the Pareto
front in the deep sea treasure environment. In our fu-
ture work, we will focus on improving the compu-
tational performance and evaluating the performance
in more complex environments, e.g., SuperMario-
Bros (Kauten, 2018).
The proposed DWN algorithm can be employed
in any system with multiple objectives such as traf-
fic control, telecommunication networks, finance, etc.
The condition being that each objective is represented
with a different reward function. The main advantage
of DWN is its ability to train multiple policies simul-
taneously. Furthermore, sharing the state space be-
tween policies is not mandatory, e.g., a policy for the
mountain car environment policies could only need
access to the velocity vector. Meaning that with DWN
it is possible to train policies with different states due
to the use of separate buffers for storing experiences.
ACKNOWLEDGEMENTS
This work was funded in part by the SFI-NSFC Part-
nership Programme Grant Number 17/NSFC/5224
and SFI under Frontiers for the Future project
21/FFP-A/8957.
REFERENCES
Abels, A., Roijers, D., Lenaerts, T., Now
´
e, A., and Steck-
elmacher, D. (2019). Dynamic weights in multi-
objective deep reinforcement learning. In Interna-
tional Conference on Machine Learning, pages 11–20.
PMLR.
Cardozo, N. and Dusparic, I. (2020). Learning run-time
compositions of interacting adaptations. SEAMS ’20,
page 108–114, New York, NY, USA. Association for
Computing Machinery.
Dusparic, I., Taylor, A., Marinescu, A., Cahill, V., and
Clarke, S. (2015). Maximizing renewable energy use
with decentralized residential demand response. In
2015 IEEE First International Smart Cities Confer-
ence (ISC2), pages 1–6.
Giupponi, L., Agusti, R., P
´
erez-Romero, J., and Sallent,
O. (2005). A novel joint radio resource manage-
ment approach with reinforcement learning mecha-
nisms. In IEEE International Performance, Comput-
ing, and Communications Conference (IPCCC), pages
621–626. Phoenix, AZ, USA.
Hribar, J., Marinescu, A., Chiumento, A., and DaSilva,
L. A. (2022). Energy Aware Deep Reinforce-
ment Learning Scheduling for Sensors Correlated in
Time and Space. IEEE Internet of Things Journal,
9(9):6732–6744.
Humphrys, M. (1995). W-learning: Competition among
selfish Q-learners.
Jin, Y. and Sendhoff, B. (2008). Pareto-Based Multiobjec-
tive Machine Learning: An Overview and Case Stud-
ies. IEEE Transactions on Systems, Man, and Cyber-
netics, Part C (Applications and Reviews), 38(3):397–
415.
Karlsson, J. (1997). Learning to solve multiple goals. Uni-
versity of Rochester.
Kauten, C. (2018). Super Mario Bros for OpenAI Gym.
GitHub: github.com/Kautenja/gym-super-mario-bros.
Kusic, K., Ivanjko, E., Vrbanic, F., Greguric, M., and Dus-
paric, I. (2021). Spatial-temporal traffic flow control
on motorways using distributed multi-agent reinforce-
ment learning. Mathematics - Special Issue Advances
in Artificial Intelligence: Models, Optimization, and
Machine Learning, 9(23).
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learn-
ing. nature, 521(7553):436–444.
Liu, C., Xu, X., and Hu, D. (2015). Multiobjective Re-
inforcement Learning: A Comprehensive Overview.
IEEE Transactions on Systems, Man, and Cybernet-
ics: Systems, 45(3):385–398.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing Atari With Deep Reinforcement
Learning. arXiv preprint arXiv:1312.5602.
Deep W-Networks: Solving Multi-Objective Optimisation Problems with Deep Reinforcement Learning
25