proach that is based on the OU process to realize ac-
tive guidance of an agent through state space by sam-
pling velocities instead of displacements. We have as-
sumed zero knowledge of the transition model, since
this is generally the case in most model-based RL set-
tings. The OU process evolves the agents’ velocity
according to a Langevin equation, where in the large-
time limit the sampled velocities follow a Gaussian
distribution. Additionally, the model allows the agent
to influence the action sampling scheme (and thus its
motion pattern) by means of a self-induced potential
function. One key advantage of our approach is that
we can derive closed-form analytical expressions.
In this paper, we have assumed that the transi-
tion model remains unknown, even after the agent has
explored the environment for some time. However,
when model-based learning is considered, the agent
often builds its knowledge in a incremental, iterative
fashion. In order to account for this, in future work we
will study the effects of making the strength θ time-
dependent as well as changing the intrinsic drift term
µ in reaction to encountered novelty. This generates a
framework wherein the intrinsic drive originates from
extrinsic sources or observations, resembling an intu-
itive implementation of a curious agent.
Furthermore, acquiring similar analytical expres-
sions for different types of random walks is highly de-
sirable. In particular, we wish to focus on a L
´
evy walk
(Zaburdaev et al., 2015). In a L
´
evy walk, the displace-
ments are sampled from a power law, interchang-
ing local displacements with long time-correlated dis-
placements within the environment. Using different
potentials, one can most likely replicate L
´
evy-like be-
havior through the process described in this work.
One could alternatively use a different formulation
of the underlying noise scheme, i.e. sample directly
from the desired distribution. This possibly give rise
to L
´
evy walks and might further enhance exploration
of an environment (Bartumeus et al., 2005; Ferreira
et al., 2012).
This work indicates a stepping stone in simulat-
ing random walks for exploration. Enabling random
walks in the absence of a transition model might
prove beneficial for model-based RL, even opening
the doors to more efficient sampling schemes that im-
prove learning in continuous state spaces.
REFERENCES
Bartumeus, F., da Luz, M. G. E., Viswanathan, G. M., and
Catalan, J. (2005). Animal search strategies: a quan-
titative random-walk analysis. Ecology, 86(11):3078–
3087.
Basu, U., Majumdar, S. N., Rosso, A., and Schehr, G.
(2018). Active brownian motion in two dimensions.
arXiv preprint arXiv:1804.09027.
Einstein, A. (1905). Investigations on the theory of the
brownian movement. Ann. der Physik.
Ferreira, A., Raposo, E., Viswanathan, G., and da Luz, M.
(2012). The influence of the environment on l
´
evy ran-
dom search efficiency: Fractality and memory effects.
Physica A: Statistical Mechanics and its Applications,
391(11):3234 – 3246.
Hafez, M. B., Weber, C., and Wermter, S. (2017). Curiosity-
driven exploration enhances motor skills of continu-
ous actor-critic learner. In 2017 Joint IEEE Interna-
tional Conference on Development and Learning and
Epigenetic Robotics (ICDL-EpiRob), pages 39–46.
Ibe, O. C. (2013). Elements of Random Walk and Diffusion
Processes. Wiley Publishing, 1st edition.
James, A., Pitchford, J. W., and Plank, M. (2010). Efficient
or inaccurate? analytical and numerical modelling of
random search strategies. Bulletin of Mathematical
Biology, 72(4):896–913.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2015). Contin-
uous control with deep reinforcement learning. CoRR,
abs/1509.02971.
Palyulin, V. V., Chechkin, A. V., and Metzler, R. (2014).
L
´
evy flights do not always optimize random blind
search for sparse targets. Proceedings of the National
Academy of Sciences, 111(8):2931–2936.
Romanczuk, P., B
¨
ar, M., Ebeling, W., Lindner, B., and
Schimansky-Geier, L. (2012). Active brownian par-
ticles. The European Physical Journal Special Topics,
202(1):1–162.
Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory
of the brownian motion. Phys. Rev., 36:823–841.
Viswanathan, G. M., Buldyrev, S. V., Havlin, S., Da Luz,
M., Raposo, E., and Stanley, H. E. (1999). Op-
timizing the success of random searches. Nature,
401(6756):911.
Volpe, G., Gigan, S., and Volpe, G. (2014). Simulation
of the active brownian motion of a microswimmer.
American Journal of Physics, 82(7):659–664.
Volpe, G. and Volpe, G. (2017). The topography of the en-
vironment alters the optimal search strategy for active
particles. Proceedings of the National Academy of
Sciences, 114(43):11350–11355.
Wilson, S. W. et al. (1996). Explore/exploit strategies in
autonomy. In Proc. of the Fourth International Con-
ference on Simulation of Adaptive Behavior: From
Animals to Animats, volume 4, pages 325–332.
Zaburdaev, V., Denisov, S., and Klafter, J. (2015). L
´
evy
walks. Reviews of Modern Physics, 87(2):483.
COMPLEXIS 2019 - 4th International Conference on Complexity, Future Information Systems and Risk
66