Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor.
Hassabis, D., Kumaran, D., Summerfield, C., and
Botvinick, M. (2017). Neuroscience-inspired artificial
intelligence. Neuron, 95(2):245–258.
Hawkins, J. and Ahmad, S. (2016). Why neurons have thou-
sands of synapses, a theory of sequence memory in
neocortex. Frontiers in Neural Circuits, 10:23.
Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Camp-
bell, R. H., Czechowski, K., Erhan, D., Finn, C.,
Kozakowski, P., Levine, S., Mohiuddin, A., Sepassi,
R., Tucker, G., and Michalewski, H. (2020). Model-
based reinforcement learning for atari.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., and et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Moerland, T. M., Broekens, J., and Jonker, C. M. (2020).
Model-based reinforcement learning: A survey.
Nugamanov, E. and Panov, A. I. (2020). Hierarchical Tem-
poral Memory with Reinforcement Learning. Proce-
dia Computer Science, 169:123–131.
Puterman, M. (1994). Markov Decision Processes. John
Wiley & Sons, Inc.
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K.,
Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Has-
sabis, D., Graepel, T., Lillicrap, T., and Silver, D.
(2020). Mastering atari, go, chess and shogi by plan-
ning with a learned model.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,
van den Driessche, G., Schrittwieser, J., Antonoglou,
I., Panneershelvam, V., Lanctot, M., Dieleman, S.,
Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I.,
Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel,
T., and Hassabis, D. (2016). Mastering the game of
go with deep neural networks and tree search. Nature,
529:484–503.
Skrynnik, A., Petrov, A., and Panov, A. I. (2016). Hierarchi-
cal Temporal Memory Implementation with Explicit
States Extraction. In Samsonovich, A. V., Klimov,
V. V., and Rybina, G. V., editors, Biologically Inspired
Cognitive Architectures (BICA) for Young Scientists.
Advances in Intelligent Systems and Computing, vol-
ume 449, pages 219–225. Springer.
APPENDIX
Handcrafted Mazes
The notation we use in the schematic views of the en-
vironments: - - empty cell, # - wall, @ - agent, X -
reward.
multi
way v0. Each experiment had 4 different re-
warding places being sequentially changed every 100
episodes:
##### ##### ##### #####
#@--# #@-X# #@--# #@--#
#-#-# #-#-# #-#-# #-#X#
#--X# #---# #X--# #---#
##### ##### ##### #####
multi way v2. Each experiment had 6 rewarding
places being changed every 200 episodes:
######### ######### #########
#---##### #-X-##### #---#####
#-#-##### #-#-##### #-#-#####
#--@---## #--@---## #--@--X##
###--#### ###--#### ###--####
###-#--X# ###-#---# ###-#---#
###---### ###---### ###---###
###-##### ###-##### ###-#####
######### ######### #########
######### ######### #########
#---##### #---##### #---#####
#-#-##### #-#-##### #-#-#####
#--@---## #X-@---## #--@---##
###--#### ###--#### ###--####
###-#---# ###-#---# ###-#--X#
###---### ###---### ###---###
###X##### ###-##### ###-#####
######### ######### #########
Planning with Hierarchical Temporal Memory for Deterministic Markov Decision Problem
1081