learning. However, evaluating the half cheetah en-
vironment, the approach to online learning policies
made a very significant difference. Improving per-
formance for both the base case strategy and the best
strategy, DC-ED-DC. Online learning and the use of
a single environment have advantages in the applica-
bility of the system in real-world robotic applications.
For future work, we would like to test with other
environments. A limitation of this work is that it only
uses the policies’ actions. It would be interesting to
also consider other variables, such as their value func-
tion, as ensemble or incorporating the value functions
into the framework.
REFERENCES
Bertrand, H. (2019). Hyper-parameter optimization in deep
learning and transfer learning: applications to medi-
cal imaging. PhD thesis, Universit
´
e Paris-Saclay.
Duell, S. and Udluft, S. (2013). Ensembles for continu-
ous actions in reinforcement learning. In ESANN 2013
proceedings, pages 24–26, Bruges, Belgium.
Fernandez, F. C. and Caarls, W. (2018). Parameters tun-
ing and optimization for reinforcement learning al-
gorithms using evolutionary computing. In Interna-
tional Conference on Information Systems and Com-
puter Science (INCISCOS), pages 301––305, Quito,
Equador. IEEE.
Hans, A. and Udluft, S. (2010). Ensembles of neural net-
works for robust reinforcement learning. In 2010
Ninth International Conference on Machine Learning
and Applications, pages 401–406, Washington, USA.
IEEE.
Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki,
W. M., Donahue, J., Razavi, A., Vinyals, O., Green,
T., Dunning, I., Simonyan, K., et al. (2017). Pop-
ulation based training of neural networks. CoRR,
abs/1711.09846.
Jung, W., Park, G., and Sung, Y. (2020). Population-
guided parallel policy search for reinforcement learn-
ing. arXiv preprint arXiv:2001.02907.
Khadka, S., Majumdar, S., Nassar, T., Dwiel, Z., Tumer,
E., Miret, S., Liu, Y., and Tumer, K. (2019). Collab-
orative evolutionary reinforcement learning. In Pro-
ceedings of the 36th International Conference on Ma-
chine Learning, volume 97 of Proceedings of Machine
Learning Research, pages 3341–3350, Long Beach,
California, USA. PMLR.
Krislock, N. and Wolkowicz, H. (2012). Euclidean distance
matrices and applications. In Handbook on Semidefi-
nite, Conic and Polynomial Optimization, pages 879–
914. Springer, Boston, MA.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and
Talwalkar, A. (2018). Hyperband: A novel bandit-
based approach to hyperparameter optimization. Jour-
nal of Machine Learning Research, 18(1):6765–6816.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Con-
tinuous control with deep reinforcement learning. In
Proceedings of International Conference on Learning
Representations, San Juan, Puerto Rico.
Liu, R. and Zou, J. (2018). The effects of memory replay in
reinforcement learning. In 56th Annual Allerton Con-
ference on Communication, Control, and Comput-
ing (Allerton), pages 478–485, Monticello, IL, USA.
IEEE.
Mnih, V., Heess, N., Graves, A., et al. (2014). Recurrent
models of visual attention. In Advances in neural in-
formation processing systems.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and
Riedmiller, M. (2014). Deterministic policy gradient
algorithms. In Proceedings of the 31st International
Conference on International Conference on Machine
Learning, volume 32, pages 387–395, Bejing, China.
JMLR.org.
Sutton, R. S. and Barto, A. G. (2018). Introduction to Re-
inforcement Learning. MIT Press, Cambridge, MA,
USA.
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner,
D., Bohez, S., and Vanhoucke, V. (2018). Sim-to-
real: Learning agile locomotion for quadruped robots.
arXiv preprint arXiv:1804.10332.
Todorov, E., Erez, T., and Tassa, Y. (2012). Mujoco:
A physics engine for model-based control. In 2012
IEEE/RSJ International Conference on Intelligent
Robots and Systems, pages 5026–5033. IEEE.
Triola, M. F. (2015). Elementary Statistics Technology Up-
date. Pearson Education, Rio de Janeiro, Reprint
(Translated), 11th edition.
Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory
of the brownian motion. Physical review, 36(5):823.
Wiering, M. A. and Van Hasselt, H. (2008). Ensemble algo-
rithms in reinforcement learning. IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernet-
ics), 38(4):930–936.
Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015).
Sequential model-free hyperparameter tuning. In
2015 IEEE International Conference on Data Mining,
pages 1033–1038.
Wu, J. and Li, H. (2020). Deep ensemble reinforcement
learning with multiple deep deterministic policy gra-
dient algorithm. Mathematical Problems in Engineer-
ing, 6:1–12.
Zhang, C. and Ma, Y. (2012). Ensemble machine learning:
methods and applications. Springer Science & Busi-
ness Media.
ICAART 2021 - 13th International Conference on Agents and Artificial Intelligence
588