
hinder the learning process. Our adaptive approach
improves training robustness and method performance
while simplifying the hyperparameter tuning proce-
dure. Our findings also suggest that our method leads
to higher control precision and faster convergence in
most cases. Overall, our work contributes to the on-
going effort to improve HRL methods, especially in
subgoal reachability issues in sparse reward GCHRL.
ACKNOWLEDGEMENTS
This research was partially funded by the Warsaw Uni-
versity of Technology within the Excellence Initiative:
Research University (IDUB) programme LAB-TECH
of Excellence (grant no. 504/04496/1032/45.010013),
research project "Bio-inspired artificial neural net-
work" (grant no. POIR.04.04.00-00-14DE/18-00)
within the Team-Net program of the Foundation
for Polish Science co-financed by the European
Union under the European Regional Development
Fund, and National Science Centre, Poland (grant no
2020/39/B/ST6/01511). This research was supported
in part by PLGrid Infrastructure.
REFERENCES
Adolph, K. E., Cole, W. G., Komati, M., Garciaguirre, J. S.,
Badaly, D., Lingeman, J. M., Chan, G. L. Y., and Sot-
sky, R. B. How Do You Learn to Walk? Thousands of
Steps and Dozens of Falls per Day. 23(11):1387–1394.
Andrychowicz, M., Crow, D., Ray, A., Schneider, J., Fong,
R., Welinder, P., McGrew, B., Tobin, J., Abbeel, P.,
and Zaremba, W. (2017). Hindsight experience replay.
ArXiv, abs/1707.01495.
Bagaria, A. and Konidaris, G. (2019). Option discovery
using deep skill chaining. In International Conference
on Learning Representations (ICLR).
Barto, A. G. and Mahadevan, S. (2003). Recent advances
in hierarchical reinforcement learning. Discrete event
dynamic systems, 13(1):41–77.
Campos, V., Trott, A., Xiong, C., Socher, R., Giró-i Nieto,
X., and Torres, J. (2020). Explore, discover and learn:
Unsupervised discovery of state-covering skills. In
International Conference on Machine Learning, pages
1317–1327. PMLR.
Chane-Sane, E., Schmid, C., and Laptev, I. Goal-
Conditioned Reinforcement Learning with Imagined
Subgoals.
Colas, C., Karch, T., Sigaud, O., and Oudeyer, P.-Y.
Autotelic Agents with Intrinsically Motivated Goal-
Conditioned Reinforcement Learning: A Short Survey.
74.
Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (2018).
Diversity is all you need: Learning skills without a
reward function. In International Conference on Learn-
ing Representations.
Eysenbach, B., Salakhutdinov, R. R., and Levine, S. (2019).
Search on the replay buffer: Bridging planning and re-
inforcement learning. Advances in Neural Information
Processing Systems, 32.
family=Vries, given=Joery A., p. u., Moerland, T. M., and
Plaat, A. On Credit Assignment in Hierarchical Rein-
forcement Learning.
Fournier, P., Sigaud, O., Chetouani, M., and Oudeyer, P.-Y.
(2018). Accuracy-based Curriculum Learning in Deep
Reinforcement Learning.
Gehring, J., Synnaeve, G., Krause, A., and Usunier, N.
(2021). Hierarchical skills for efficient exploration.
Advances in Neural Information Processing Systems,
34:11553–11564.
Ghosh, D., Gupta, A., Reddy, A., Fu, J., Devin, C., Eysen-
bach, B., and Levine, S. (2019). Learning to reach
goals via iterated supervised learning. arXiv preprint
arXiv:1912.06088.
Gürtler, N., Büchler, D., and Martius, G. (2021). Hier-
archical reinforcement learning with timed subgoals.
Advances in Neural Information Processing Systems,
34:21732–21743.
Gürtler, N., Büchler, D., and Martius, G. Hierarchical Rein-
forcement Learning with Timed Subgoals.
Jiao, Y. and Tsuruoka, Y. HiRL: Dealing with Non-
stationarity in Hierarchical Reinforcement Learning
via High-level Relearning.
Lee, S., Kim, J., Jang, I., and Kim, H. J. (2022). Dhrl: A
graph-based approach for long-horizon and sparse hi-
erarchical reinforcement learning. Neural Information
Processing Systems.
Levy, A., Konidaris, G., Platt, R., and Saenko, K.
(2017). Learning multi-level hierarchies with hind-
sight. arXiv:1712.00948.
Li, S., Zhang, J., Wang, J., Yu, Y., and Zhang, C. (2021).
Active hierarchical exploration with stable subgoal rep-
resentation learning. In International Conference on
Learning Representations.
Liu, M., Zhu, M., and Zhang, W. (2022). Goal-Conditioned
Reinforcement Learning: Problems and Solutions.
McGovern, A. and Barto, A. G. (2001). Automatic discovery
of subgoals in reinforcement learning using diverse
density.
Nachum, O., Gu, S. S., Lee, H., and Levine, S. (2018). Data-
efficient hierarchical reinforcement learning. In Neural
information processing systems (NeurIPS), volume 31.
Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., and Levine,
S. (2019). Why does hierarchy (sometimes) work so
well in reinforcement learning? arXiv preprint arXiv:
Arxiv-1909.10618.
Pateria, S., Subagdja, B., Tan, A.-h., and Quek, C. Hier-
archical Reinforcement Learning: A Comprehensive
Survey. 54(5):1–35.
Portelas, R., Colas, C., Weng, L., Hofmann, K., and Oudeyer,
P.-Y. (2020). Automatic Curriculum Learning For Deep
RL: A Short Survey.
Precup, D. (2000). Temporal abstraction in reinforcement
learning. University of Massachusetts Amherst.
Subgoal Reachability in Goal Conditioned Hierarchical Reinforcement Learning
229