Doicin, B., Popescu, M., and Patrascioiu, C. (2016). Pid
controller optimal tuning. In 2016 8th International
Conference on Electronics, Computers and Artificial
Intelligence (ECAI), pages 1–4.
Ghorbel, F., Fitzmorris, A., and Spong, M. W. (1991). Ro-
bustness of adaptive control of robots: theory and ex-
periment. In Advanced Robot Control, pages 1–29.
Springer.
Golemo, F., Taiga, A. A., Courville, A., and Oudeyer, P.-Y.
(2018). Sim-to-real transfer with neural-augmented
robot simulation. In Billard, A., Dragan, A., Peters,
J., and Morimoto, J., editors, Proceedings of The 2nd
Conference on Robot Learning, volume 87 of Pro-
ceedings of Machine Learning Research, pages 817–
828. PMLR.
Guo, B., Liu, H., Luo, Z., and Chen, W. (2009). Adaptive
pid controller based on bp neural network. 2009 In-
ternational Joint Conference on Artificial Intelligence,
pages 148–150.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. CoRR,
abs/1801.01290.
Hang, C., Astrom, K., and Wang, Q. (2002). Relay feedback
auto-tuning of process controllers—a tutorial review.
Journal of process control, 12(1):143–162.
Hansen, N. (2016). The CMA evolution strategy: A tutorial.
CoRR, abs/1604.00772.
Hansen, N., Auger, A., Ros, R., Finck, S., and Po
ˇ
s
´
ık, P.
(2010). Comparing results of 31 algorithms from
the black-box optimization benchmarking bbob-2009.
In Proceedings of the 12th Annual Conference Com-
panion on Genetic and Evolutionary Computation,
GECCO ’10, pages 1689–1696, New York, NY, USA.
ACM.
Hill, A., Raffin, A., Ernestus, M., Traore, R., Dhariwal,
P., Hesse, C., Klimov, O., Nichol, A., Plappert, M.,
Radford, A., Schulman, J., Sidor, S., and Wu, Y.
(2018). Stable baselines. https://github.com/hill-a/
stable-baselines.
Ho, W. K., Gan, O., Tay, E. B., and Ang, E. (1996). Per-
formance and gain and phase margins of well-known
pid tuning formulas. IEEE Transactions on Control
Systems Technology, 4(4):473–477.
Hornik, K., Stinchcombe, M., and White, H. (1990). Uni-
versal approximation of an unknown mapping and
its derivatives using multilayer feedforward networks.
Neural Networks, 3(5):551 – 560.
Jalali, L. and Ghafarian, H. (2009). Maintenance of robot’s
equilibrium in a noisy environment with fuzzy con-
troller. In 2009 IEEE International Conference on
Intelligent Computing and Intelligent Systems, vol-
ume 2, pages 761–766.
Jaulin, L. (2015). Mobile Robotics. Mobile Robotics.
Jiang, L., Deng, M., and Inoue, A. (2008). Support vec-
tor machine-based two-wheeled mobile robot motion
control in a noisy environment. Proceedings of the In-
stitution of Mechanical Engineers, Part I: Journal of
Systems and Control Engineering, 222(7):733–743.
Lenain, R., Deremetz, M., Braconnier, J.-B., Thuilot, B.,
and Rousseau, V. (2017). Robust sideslip angles ob-
server for accurate off-road path tracking control. Ad-
vanced Robotics, 31(9):453–467.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning. CoRR,
abs/1509.02971.
Marova, K. (2016). Using CMA-ES for tuning coupled
PID controllers within models of combustion engines.
CoRR, abs/1609.06741.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap,
T. P., Harley, T., Silver, D., and Kavukcuoglu, K.
(2016). Asynchronous methods for deep reinforce-
ment learning. CoRR, abs/1602.01783.
OpenAI, Andrychowicz, M., Baker, B., Chociej, M.,
J
´
ozefowicz, R., McGrew, B., Pachocki, J. W., Pa-
chocki, J., Petron, A., Plappert, M., Powell, G., Ray,
A., Schneider, J., Sidor, S., Tobin, J., Welinder, P.,
Weng, L., and Zaremba, W. (2018). Learning dexter-
ous in-hand manipulation. CoRR, abs/1808.00177.
Risi, S. and Togelius, J. (2014). Neuroevolution in
games: State of the art and open challenges. CoRR,
abs/1410.7326.
Salimans, T., Ho, J., Chen, X., Sidor, S., and Sutskever, I.
(2017). Evolution Strategies as a Scalable Alternative
to Reinforcement Learning. ArXiv e-prints.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. CoRR, abs/1707.06347.
Shu, H., Wang, X., and Huang, Z. (2015). Identification
of multivariate system based on pid neural network.
In 2015 Sixth International Conference on Intelligent
Control and Information Processing (ICICIP), pages
199–202.
Sivananaithaperumal, S. and Baskar, S. (2014). Design
of multivariable fractional order pid controller us-
ing covariance matrix adaptation evolution strategy.
Archives of Control Sciences, 24(2):235–251.
Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley,
K. O., and Clune, J. (2017). Deep neuroevolution: Ge-
netic algorithms are a competitive alternative for train-
ing deep neural networks for reinforcement learning.
CoRR, abs/1712.06567.
Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner,
D., Bohez, S., and Vanhoucke, V. (2018). Sim-to-
real: Learning agile locomotion for quadruped robots.
CoRR, abs/1804.10332.
Tyreus, B. D. and Luyben, W. L. (1992). Tuning pi con-
trollers for integrator/dead time processes. Industrial
& Engineering Chemistry Research, 31(11):2625–
2628.
Wakasa, Y., Kanagawa, S., Tanaka, K., and Nishimura,
Y. (2010). PID Controller Tuning Based on the Co-
variance Matrix Adaptation Evolution Strategy. IEEJ
Transactions on Electronics, Information and Sys-
tems, 130:737–742.
Welch, B. L. (1947). The generalization of ‘student’s’ prob-
lem when several different population varlances are
involved. Biometrika, 34(1-2):28–35.
Neuroevolution with CMA-ES for Real-time Gain Tuning of a Car-like Robot Controller
319