Another approach is to let the decision maker
adaptively learn the exploration policy in DDPG (Xu,
2018). The advantage is that this approach is scalable
and yields to a better global exploration. The
disadvantage is that this approach consumes more
memory than OU or EGC.
Finally, deep reinforcement learning in
continuous state and time spaces is still not robust to
small environmental changes and hyper parameter
optimization. For DDPG, the effect of every
individual action vanishes if the discretization
timestep becomes infinitesimal (Tallec, 2019). For
the pendulum environment, algorithm parameters
could be tuned to generate better performance
through a continuous-time analysis.
6 CONCLUSION
The main objective of this paper was to analyze which
combinations of reinforcement learning algorithms,
exploration methods and replay memories are most
suitable for discrete and continuous state spaces as
well as action spaces. Tests were performed in a
simulated discrete bit-flip and continuous pendulum
environment.
This research introduced new techniques to the
state-of-the-art methods, such as Hindsight
Experience Replay with Goal Discovery (HERGD),
ε-greedy Continuous (EGC), and Ornstein-
Uhlenbeck Annealed (OUA). While Ornstein-
Uhlenbeck Annealed did not improve performance,
Hindsight Experience Replay with Goal Discovery, ε-
greedy Continuous proved to perform well.
Equipped with a suitable combination of
algorithms, the next step is to transfer it into a self-
learning robot, which is based on embedded
hardware. The robot is supposed to start without
knowing anything about its sensors, actuators and
environment and gradually learn to survive. In
embedded hardware resource constraints will be an
important challenge to handle.
Enabling robots to learn complex tasks through
experience allows us to take a big step into the future.
The applications for such self-learning robots are
limitless. Writing complex algorithms to control
these robots is eliminated because they learn to
control themselves. In addition, through repetition,
they are able to optimize their behavior. Changes in
the environment do not affect them because they can
adapt to them automatically.
ACKNOWLEDGEMENTS
We gratefully acknowledge the financial support
provided to us by the BMVIT and FFG (Austrian
Research Promotion Agency) program Production of
the Future in the SAVE project (864883).
REFERENCES
Andrychowicz, M., Wolski, F., Ray, A., Schneider, J.,
Fong, R., Welinder, P., McGrew, B., Tobin, J., Abbeel,
P., Zaremba, W.: Hindsight Experience Replay. In:
CoRR abs/1707.01495 (2017)
Dayan, P.: The Convergence of TD(lambda) for General
lambda. In: Machine Learning 8 (1992), May, Nr. 3, S.
341–362. – ISSN 1573–0565
van Hasselt, H., Guez, A., S., David: Deep Reinforcement
Learning with Double Q-learning. In: CoRR
abs/1509.06461 (2015)
Hwangbo, J., Sa, I., Siegwart, R., Hutter, M.: Control of a
Quadrotor With Reinforcement Learning. In: IEEE
Robotics and Automation Letters 2 (2017), Oct, Nr. 4,
S. 2096–2103. – ISSN 2377–3766
Kingma, D. P., Ba, J.: Adam: A Method for Stochastic
Optimization. In: CoRR abs/1412.6980 (2014)
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J.,
Desjardins, G., Rusu, A. A., Milan, K., Quan, J.,
Ramalho, T., Grabska-Barwinska, A., Hassabis, D.,
Clopath, C., Kumaran, D., Hadsell, R.: Overcoming
catastrophic forgetting in neural networks. In:
Proceedings of the National Academy of Sciences 114
(2017), Nr. 13, S. 3521–3526. – ISSN 0027–8424
Kober, J., Peters, J.: Reinforcement Learning in Robotics:
A Survey. Bd. 12. Berlin, Germany: Springer, 2012, S.
579–610
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., Wierstra, D.: Continuous control
with deep reinforcement learning. In: CoRR
abs/1509.02971 (2015)
Leuenberger G. and Wiering M. (2018). Actor-Critic
Reinforcement Learning with Neural Networks in
Continuous Games.In Proceedings of the 10th
International Conference on Agents and Artificial
Intelligence - Volume 2: ICAART, ISBN 978-989-758-
275-2, pages 53-60. DOI: 10.5220/0006556500530060
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., Riedmiller, M A.: Playing
Atari with Deep Reinforcement Learning. In: CoRR
abs/1312.5602 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M.,
Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C.,
Sadik, A., Antonoglou, I., King, H., Kumaran, D.,
Wierstra, D., Legg, S., Hassabis, D.: Human-level
control through deep reinforcement learning. In: Nature
518 (2015), Februar, Nr. 7540, S. 529–533. – ISSN
00280836