ness, consistency and locality), but also provides in-
formation that can be used to decide about the actions
of an agent.
The advantage of reflexive reinforcement learn-
ing is that an agent can learn even in the ab-
sence of an evaluative signal (reward and punish-
ment), it can bootstrap elementary actions (as in
homeokinesis (Der and Martius, 2012)) or can learn
about options in the environment (as in empower-
ment (Klyubin et al., 2005b)), and obtain more mean-
ingful and generalisable representations (see (Smith
and Herrmann, 2019)).
The unavoidable difficulty in reflexive reinforce-
ment learning consists in the fact that the use of quan-
tities that are eventually based on the reward as a re-
ward, introduces a feedback loop which can lead to
instabilities or divergences. This is not unknown in
RL, where e.g., an often visited source of low re-
ward can dominate a better source of reward that is
rarely found, or in cases where correlations among
basis functions lead to divergences as notice already
in Ref. (Baird, 1995).
In RRL such feedback is even more typical, but
can also be used to introduce structure the state space
by self-organised pattern formation or to identify hi-
erarchical relationships as will be studied in future.
In order to keep the effects of self-referentiality under
control and to make use of their potential a dynamical
systems theory of reinforcement learning is required
that does not only consider the agent as a dynamical
system, but the full interactive system formed by the
agent, its environment and its internal representations.
This research was funded by EPSRC through the CDT
RAS at Edinburgh Centre for Robotics. Discussions
with Calum Imrie and Simon Smith are gratefully ac-
