Figure 8: Snapshots of the real-world experiment for the 1
vs 2 situation. After about 15 training episodes, the robot
found the optimal position to see the two enemies simulta-
neously. During the episode that the reward is the highest,
the robot started from the initial position, and then navi-
gated to the optimal place at the end of the first iteration.
The robot stayed at the optimal place during the rest of the
episode to get a maximal reward.
REFERENCES
Chua, Kurtland and Calandra, Roberto and McAllister,
Rowan and Levine, S. (2018). Deep Reinforcement
Learning in a Handful of Trials using Probabilistic
Dynamics Models. Advances in Neural Information
Processing Systems, (NeurIPS):4754—-4765.
Deisenroth, M. P. and Rasmussen, C. E. (2011). PILCO:
A model-based and data-efficient approach to policy
search. Proceedings of the 28th International Confer-
ence on Machine Learning, ICML 2011, pages 465–
472.
Depeweg, S., Hernández-Lobato, J. M., Doshi-Velez, F.,
and Udluft, S. (2019). Learning and policy search
in stochastic dynamical systems with Bayesian neural
networks. 5th International Conference on Learning
Representations, ICLR 2017 - Conference Track Pro-
ceedings, pages 1–14.
Efron, B. (1982). The jackknife, the bootstrap, and other
resampling plans, volume 38. Siam.
Gal, Y., Mcallister, R. T., and Rasmussen, C. E. (2016). Im-
proving PILCO with Bayesian Neural Network Dy-
namics Models. Data-Efficient Machine Learning
Workshop, ICML, pages 1–7.
Gamboa Higuera, J. C., Meger, D., and Dudek, G. (2018).
Synthesizing Neural Network Controllers with Proba-
bilistic Model-Based Reinforcement Learning. IEEE
International Conference on Intelligent Robots and
Systems, pages 2538–2544.
Gu, S., Lillicrap, T., Sutskever, U., and Levine, S. (2016).
Continuous deep q-learning with model-based accel-
eration. 33rd International Conference on Machine
Learning, ICML 2016, 6:4135–4148.
Kahn, G., Villaflor, A., Pong, V., Abbeel, P., and
Levine, S. (2017). Uncertainty-aware reinforcement
learning for collision avoidance. arXiv preprint
arXiv:1702.01182.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Contin-
uous control with deep reinforcement learning. 4th In-
ternational Conference on Learning Representations,
ICLR 2016 - Conference Track Proceedings.
MacKay, D. J. (1992). Bayesian Methods for Adaptive
Models. PhD thesis, California Institute of Technol-
ogy.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2016). Playing Atari with Deep Reinforcement
Learning.
Nagabandi, A., Kahn, G., Fearing, R. S., and Levine, S.
(2018). Neural Network Dynamics for Model-Based
Deep Reinforcement Learning with Model-Free Fine-
Tuning. Proceedings - IEEE International Conference
on Robotics and Automation, pages 7579–7586.
Nair, V. and Hinton, G. E. (2010). Rectified linear units
improve restricted boltzmann machines. In Icml.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., et al. (2019). Pytorch: An imperative
style, high-performance deep learning library. arXiv
preprint arXiv:1912.01703.
Rusu, A. A., Vecerik, M., Rothörl, T., Heess, N., Pascanu,
R., and Hadsell, R. (2017). Sim-to-Real Robot Learn-
ing from Pixels with Progressive Nets. 1st Conference
on Robot Learning, CoRL 2017, (CoRL):1–9.
Watkins, C. J. and Dayan, P. (1992). Q-learning. Machine
learning, 8(3-4):279–292.
Weng, J., Zhang, M., Duburcq, A., You, K., Yan, D., Su,
H., and Zhu, J. (2020). Tianshou. https://github.com/
thu-ml/tianshou.
Zhang, Y. and Rosendo, A. (2019). Tactical reward shaping:
Bypassing reinforcement learning with strategy-based
goals. IEEE International Conference on Robotics
and Biomimetics, ROBIO 2019, (December):1418–
1423.
ICINCO 2021 - 18th International Conference on Informatics in Control, Automation and Robotics
506