
Khazatsky, A., Nair, A., Jing, D., and Levine, S. (2021).
What can i do here? learning new skills by imagin-
ing visual affordances. In International Conference
on Robotics and Automation (ICRA), pages 14291–
14297. IEEE.
Kocsis, L. and Szepesvari, C. (2006). Bandit based monte-
carlo planning. In European Conference on Machine
Learning.
Kulkarni, T. D., Narasimhan, K. R., Saeedi, A., and Tenen-
baum, J. B. (2016). Hierarchical deep reinforcement
learning: Integrating temporal abstraction and intrin-
sic motivation. In 30th Conference on Neural Infor-
mation Processing Systems (NIPS 2016).
Lehman, J. and Stanley, K. O. (2011). Novelty search and
the problem with objectives. In Genetic Programming
Theory and Practice IX (GPTP 2011).
Li, J., Tang, C., Tomizuka, M., and Zhan, W. (2022). Hi-
erarchical planning through goal-conditioned offline
reinforcement learning. In Robotics and Automation
Letters, volume 7. IEEE.
Li, W., Wang, X., Jin, B., and Zha, H. (2023). Hierarchi-
cal diffusion for offline decision making. In Proceed-
ings of the 40th International Conference on Machine
Learning, volume 202, pages 20035–20064. PMLR.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2016). Contin-
uous control with deep reinforcement learning. In In-
ternational Conference on Learning Representations
(ICLR).
Lin, L.-H. (1992). Self-improving reactive agents based on
reinforcement learning, planning and teaching. Ma-
chine learning, 8(3/4):69–97.
Liu, M., Zhu, M., and Zhangy, W. (2022). Goal-conditioned
reinforcement learning: Problems and solutions. In
Proceedings of the Thirty-First International Joint
Conference on Artificial Intelligence (IJCAI-22).
Lu, L., Zhang, W., Gu, X., Ji, X., and Chen, J. (2020).
Hmcts-op: Hierarchical mcts based online planning
in the asymmetric adversarial environment. Symme-
try, 12(5):719.
Mezghani, L., Sukhbaatar, S., Bojanowski, P., Lazaric, A.,
and Alahari, K. (2022). Learning goal-conditioned
policies offline with self-supervised reward shaping.
In 6th Conference on Robot Learning (CoRL 2022).
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., and Riedmiller, M.
(2015). Human-level control through deep reinforce-
ment learning. Nature, 518(7540):529–533.
Pertsch, K., Rybkin, O., Ebert, F., Finn, C., Jayaraman,
D., and Levine, S. (2020). Long-horizon visual plan-
ning with goal-conditioned hierarchical predictors. In
34th Conference on Neural Information Processing
Systems (NeurIPS 2020).
Pinto, I. P. and Coutinho, L. R. (2019). Hierarchical re-
inforcement learning with monte carlo tree search
in computer fighting game. IEEE Transactions on
Games, 11(3):290–295.
Plaat, A. (2023). Deep Reinforcement Learning. Springer
Nature.
Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015).
Universal value function approximators. In Proceed-
ings of ICML-15, volume 37, pages 1312–1320.
Schmidhuber, J. (1991). Curious model-building control
systems. In Proceedings of Neural Networks, 1991
IEEE International Joint Conference, pages 1458–
1463.
Schmidhuber, J. (2006). Developmental robotics, optimal
artificial curiosity, creativity, music, and the fine arts.
Connect. Sci., 18:173–187.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. CoRR, abs/1707.06347.
Shin, W. and Kim, Y. (2023). Guide to control: Of-
fline hierarchical reinforcement learning using sub-
goal generation for long-horizon and sparse-reward
tasks. In Proceedings of the Thirty-Second Inter-
national Joint Conference on Artificial Intelligence
(IJCAI-23), pages 4217–4225.
Sutton, R. S., Precup, D., and Singh, S. P. (1999). Be-
tween mdps and semi-mdps: A framework for tem-
poral anstraction in reinforcement learning. Artificial
Intelligence, 112(1-2):181–211.
Vassilvitskii, S. and Arthur, D. (2006). k-means++: The
advantages of careful seeding. In Proceedings of the
eighteenth annual ACM-SIAM symposium on discrete
algorithms, page 1027–1035.
Wang, V. H., Wang, T., Yang, W., K am ar ainen, J.-K., and
Pajarinen, J. (2024). Probabilistic subgoal representa-
tions for hierarchical reinforcement learning. In Pro-
ceedings of the 41st International Conference on Ma-
chine Learning, volume 235. PMLR.
Yang, X., Ji, Z., Wu, J., Lai, Y.-K., Wei, C., Liu, G., and
Setchi, R. (2022). Hierarchical reinforcement learning
with universal policies for multi-step robotic manipu-
lation. IEEE Transactions on Neural Networks and
Learning Systems, 33(9):4727–4741.
Zadem, M., Mover, S., and Nguyen, S. M. (2023). Goal
space abstraction in hierarchical reinforcement learn-
ing via set-based reachability analysis. In 22nd IEEE
International Conference on Development and Learn-
ing (ICDL 2023), pages 423–428.
Zhang, S. and Sutton, R. S. (2017). A deeper look at expe-
rience replay. In 31st Conference on Neural Informa-
tion Processing Systems (NIPS 2017).
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
514