Bootstrapping a DQN Replay Memory with Synthetic Experiences
Wenzel Baron Pilar Von Pilchau
1 a
, Anthony Stein
2
and J
¨
org H
¨
ahner
1
1
Organic Computing Group, University of Augsburg, Eichleitnerstr. 30, Augsburg, Germany
2
Artificial Intelligence in Agricultural Engineering, University of Hohenheim, Garbenstraße 9, Hohnheim, Germany
Keywords:
Experience Replay, Deep Q-Network, Deep Reinforcement Learning, Interpolation, Machine Learning.
Abstract:
An important component of many Deep Reinforcement Learning algorithms is the Experience Replay that
serves as a storage mechanism or memory of experienced transitions. These experiences are used for training
and help the agent to stably find the perfect trajectory through the problem space. The classic Experience
Replay however makes only use of the experiences it actually made, but the stored transitions bear great
potential in form of knowledge about the problem that can be extracted. The gathered knowledge contains
state-transitions and received rewards that can be utilized to approximate a model of the environment. We
present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the
learner with augmented training data. The Interpolated Experience Replay is evaluated on the FrozenLake
environment and we show that it can achieve a 17% increased mean reward compared to the classic version.
1 INTRODUCTION
The concept known as Experience Replay (ER)
started as an extension to Q-Learning and AHC-
Learning (Lin, 1992) and developed to a norm in
many Deep Reinforcement Learning (RL) algorithms
(Schaul et al., 2015; Mnih et al., 2015; Andrychow-
icz et al., 2017). One major advantage is its ability
to increase sample efficiency. Another important as-
pect is, that algorithms like Deep Q-Network (DQN)
are even not able to learn in a stable manner without
this extension (Tsitsiklis and Van Roy, 1997). This
effect is caused by correlations in the observation se-
quence and the fact that small updates may signifi-
cantly change the policy and in turn alternate the dis-
tribution of the data. By uniformly sampling over the
stored transitions, ER is able to remove these corre-
lations as well as smoothing over changes in the data
distribution (Mnih et al., 2015).
Most versions of ER store the real, actually made,
experiences. For instance the authors of (Mnih et al.,
2015) used vanilla ER for their combination with
DQN, and also (Schaul et al., 2015) who extended
vanilla ER to their Prioritized Experience Replay, that
is able to favour experiences from which the learner
can benefit most. But there are also approaches that
are filling their replay memory with some kind of
synthetic experiences to support the learning pro-
a
https://orcid.org/0000-0001-9307-855X
cess. One example is the Hindsight Experience Re-
play from (Andrychowicz et al., 2017) that takes a
trajectory of states and actions aligned with a goal and
replaces the goal with the last state of the trajectory to
create a synthetic experience. Both, the actual experi-
enced trajectory, as well as the synthetic one are then
stored in the ER. This approach helps the learner to
understand how it is able to reach different goals. This
approach was implemented in a multi-objective prob-
lem space and after reaching some synthetic goals the
agent is able to learn how to reach the intended one.
Our contribution is an algorithm that is targeted
to improve (Deep) RL algorithms that make use of
an ER, like e.g. DQN, DDPG or classic Q-Learning
(Zhang and Sutton, 2017), in nondeterministic and
discrete environments by means of creating synthetic
experiences utilizing stored real transitions. We can
increase sample efficiency as transitions are further
used to generate more and even better experiences.
The algorithm therefore computes an average value
of the received rewards in a situation and combines
this value with observed follow-up states to create so
called interpolated experiences that assists the learner
in its exploration phase.
The evaluation is performed on the FrozenLake
environment from the OpenAI Gym (Brockman et al.,
2016).
This approach investigates only discrete and non-
deterministic environments and the averaging is a
rather simple method as well, but the intention is
404
von Pilchau, W., Stein, A. and Hähner, J.
Bootstrapping a DQN Replay Memory with Synthetic Experiences.
DOI: 10.5220/0010107904040411
In Proceedings of the 12th International Joint Conference on Computational Intelligence (IJCCI 2020), pages 404-411
ISBN: 978-989-758-475-6
Copyright
c
2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved