will provide a shared testing ground for researchers
interested in this topic. In future work, we plan to
release algorithms developed using this environment
that will explore the state space by proposing subgoals
to themselves.
ACKNOWLEDGEMENT
This scientific article is part of the RICAIP project
that has received funding from the European Union’s
Horizon 2020 research and innovation programme
under grant agreement No 857306. The results were
supported by the Ministry of Education, Youth and
Sports within the dedicated program ERC CZ under
the project POSTMAN no. LL1902.
REFERENCES
Aubret, A., Matignon, L., and Hassas, S. (2019). A sur-
vey on intrinsic motivation in reinforcement learning.
CoRR, abs/1908.06976.
Barto, A. G. and Mahadevan, S. (2003). Recent advances
in hierarchical reinforcement learning. Discrete Event
Dynamic Systems, 13(4):341–379.
Beattie, C., Leibo, J. Z., Teplyashin, D., Ward, T., Wain-
wright, M., K
¨
uttler, H., Lefrancq, A., Green, S.,
Vald
´
es, V., Sadik, A., Schrittwieser, J., Anderson, K.,
York, S., Cant, M., Cain, A., Bolton, A., Gaffney,
S., King, H., Hassabis, D., Legg, S., and Petersen, S.
(2016). DeepMind lab.
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M.
(2013). The arcade learning environment: An evalua-
tion platform for general agents. 47:253–279.
Chane-Sane, E., Schmid, C., and Laptev, I. (2021). Goal-
conditioned reinforcement learning with imagined
subgoals. CoRR, abs/2107.00541.
Czechowski, K., Odrzyg
´
ozdz, T., Zbysinski, M., Zawalski,
M., Olejnik, K., Wu, Y., Kucinski, L., and Milos, P.
(2021). Subgoal search for complex reasoning tasks.
CoRR, abs/2108.11204.
Eysenbach, B., Salakhutdinov, R., and Levine, S. (2019).
Search on the replay buffer: Bridging planning and
reinforcement learning.
Gavish, B. and Graves, S. C. (1978). The travelling sales-
man problem and related problems. Publisher: Mas-
sachusetts Institute of Technology, Operations Re-
search Center.
Juliani, A., Khalifa, A., Berges, V.-P., Harper, J., Teng, E.,
Henry, H., Crespi, A., Togelius, J., and Lange, D.
(2019). Obstacle tower: A generalization challenge
in vision, control, and planning.
Mirhoseini, A., Goldie, A., Yazgan, M., Jiang, J. W.,
Songhori, E., Wang, S., Lee, Y.-J., Johnson, E.,
Pathak, O., Nazi, A., et al. (2021). A graph place-
ment methodology for fast chip design. Nature,
594(7862):207–212.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing.
Nasiriany, S., Pong, V. H., Lin, S., and Levine, S. (2019).
Planning with goal-conditioned policies.
Precup, D. and Sutton, R. S. (2000). Temporal Abstrac-
tion in Reinforcement Learning. phdthesis. ISBN:
0599844884.
Samvelyan, M., Kirk, R., Kurin, V., Parker-Holder, J.,
Jiang, M., Hambro, E., Petroni, F., K
¨
uttler, H.,
Grefenstette, E., and Rockt
¨
aschel, T. (2021). Mini-
Hack the Planet: A Sandbox for Open-Ended Rein-
forcement Learning Research.
Schaul, T. (2013). A video game description language for
model-based or interactive learning. In Proceedings of
the IEEE Conference on Computational Intelligence
in Games, Niagara Falls. IEEE Press.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai,
M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D.,
Graepel, T., Lillicrap, T., Simonyan, K., and Hass-
abis, D. (2017). Mastering chess and shogi by self-
play with a general reinforcement learning algorithm.
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y.
(1999a). Policy gradient methods for reinforcement
learning with function approximation. In Solla, S.,
Leen, T., and M
¨
uller, K., editors, Advances in Neu-
ral Information Processing Systems, volume 12. MIT
Press.
Sutton, R. S., Precup, D., and Singh, S. (1999b). Between
mdps and semi-mdps: A framework for temporal ab-
straction in reinforcement learning. Artificial Intelli-
gence, 112(1):181–211.
Vereecken, R. (2018). PyVGDL 2.0. original-date: 2018-
07-10T16:54:41Z.
Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N.,
Jaderberg, M., Silver, D., and Kavukcuoglu, K.
(2017). Feudal networks for hierarchical reinforce-
ment learning. CoRR, abs/1703.01161.
Zawalski, M., Tyrolski, M., Czechowski, K., Stachura, D.,
Piekos, P., Odrzyg
´
o
´
zd
´
z, T., Wu, Y., Kuci
´
nski, L., and
Miło
´
s, P. (2023). Fast and Precise: Adjusting Planning
Horizon with Adaptive Subgoal Search.
Molecule Builder: Environment for Testing Reinforcement Learning Agents
455