stances up to 12 characters including - see Figure 9.
Figure 9: Success rate of Reinforcement Learning con-
trollers trained on the Password Cracker problem of size 8
and tested on instances up to 12.
The performance degradation of Reinforcement
Learning controllers (RQ4) strengthens the idea that
small changes in the search space greatly contribute
to the dynamics of the environment, which are usu-
ally described as problem-dependent. It also demon-
strates that policy gradient algorithms driven by func-
tion approximation aren’t especially useful for ex-
panding policies to new optimization landscapes.
4 CONCLUSIONS
Reinforcement Learning controllers have proven to be
effective methods for boosting the rate at which Ge-
netic Algorithms find the optimal solution to a given
problem, resulting in hybrid optimizers with reduced
running times and faster convergence. This is par-
ticularly noticeable in our novel contribution with
DDPG and PPO continuous policy gradient algo-
rithms, which outperformed Q-Learning and SARSA
discrete value-based approaches in the vast majority
of test environments despite the associated learning
overhead of querying a neural network. Moreover,
this work also suggested that even a fine tuned ge-
netic algorithm with appropriate crossover and muta-
tion rates may not perform optimally as long as these
remain fixed throughout the generations.
Lastly, this study highlighted the fact that Rein-
forcement Learning agents do not generalize well to
larger instances of the problem they were initially
trained on. Conversely, a genetic algorithm with de-
fault parameter values performed comparatively bet-
ter on larger search spaces of the same task. Regard-
less, with respect to the traditional approach to param-
eter control, the successful application of continuous
policy gradient methods opens the door to a branch
of hybrid optimization algorithms that can deal with
the dynamics of a stochastic process optimally in an
effortless manner, bringing together two fields of Ar-
tificial Intelligence that could determine the next gen-
eration of Evolutionary Algorithms.
REFERENCES
Aleti, A. and Moser, I. (2016). A systematic literature re-
view of adaptive parameter control methods for evolu-
tionary algorithms. ACM Computing Surveys (CSUR),
49(3):1–35.
Dayan, P. and Watkins, C. (1992). Q-learning. Machine
learning, 8(3):279–292.
Drugan, M. M. (2019). Reinforcement learning versus
evolutionary computation: A survey on hybrid al-
gorithms. Swarm and evolutionary computation,
44:228–246.
Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag,
P., Lillicrap, T., Hunt, J., Mann, T., Weber, T., De-
gris, T., and Coppin, B. (2015). Deep reinforcement
learning in large discrete action spaces. arXiv preprint
arXiv:1512.07679.
E.Borel (1913). M
´
ecanique statistique et irr
´
eversibilit
´
e.
pages 189–196.
Eiben,
´
A. E., Hinterding, R., and Michalewicz, Z. (1999).
Parameter control in evolutionary algorithms. IEEE
Transactions on evolutionary computation, 3(2):124–
141.
H
¨
ohn, C. and Reeves, C. (1996). The crossover landscape
for the onemax problem.
Karafotias, G., Eiben, A. E., and Hoogendoorn, M. (2014a).
Generic parameter control with reinforcement learn-
ing. Proceedings of the 2014 Annual Conference on
Genetic and Evolutionary Computation.
Karafotias, G., Hoogendoorn, M., and Eiben,
´
A. E. (2014b).
Parameter control in evolutionary algorithms: Trends
and challenges. IEEE Transactions on Evolutionary
Computation, 19(2):167–187.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971.
Olson, A. T. (1993). The eight queens problem. Journal
of Computers in Mathematics and Science Teaching,
12(1):93–102.
Pygad (2021). pygad Module — PyGAD 2.13.0 documen-
tation.
Rummery, G. A. and Niranjan, M. (1994). On-line Q-
learning using connectionist systems, volume 37. Uni-
versity of Cambridge, Department of Engineering
Cambridge, UK.
Sakurai, Y., Takada, K., Kawabe, T., and Tsuruta, S. (2010).
A method to control parameters of evolutionary algo-
rithms by using reinforcement learning. pages 74–79.
IEEE.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
ECTA 2021 - 13th International Conference on Evolutionary Computation Theory and Applications
122