
learning via unsupervised task representation learn-
ing. In Proceedings of the 40th International Con-
ference on Machine Learning, volume 202 of PMLR,
Honolulu, Hawaii, USA. PMLR.
Baranes, A. and Oudeyer, P.-Y. (2009). R-iac: robust in-
trinsically motivated exploration and active learning.
IEEE Transactions on Autonomous Mental Develop-
ment, 1(3):155–169.
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.
(2009). Curriculum learning. In Proceedings of
the 26th annual international conference on machine
learning, pages 41–48. ACM.
Benjamins, C., Eimer, T., Schubert, F., Biedenkapp, A.,
Rosenhahn, B., Hutter, F., and Lindauer, M. (2021).
Carl: A benchmark for contextual and adaptive re-
inforcement learning. In Thirty-Fifth Conference on
Neural Information Processing Systems Datasets and
Benchmarks Track.
Cobbe, K., Hesse, C., Hilton, J., and Schulman, J. (2019a).
Leveraging procedural generation to benchmark rein-
forcement learning. arXiv preprint arXiv:1912.01588.
Cobbe, K., Klimov, O., Hesse, C., Kim, T., and Schulman,
J. (2019b). Quantifying generalization in reinforce-
ment learning. In International Conference on Ma-
chine Learning, pages 1282–1289. PMLR.
Dennis, M., Jaques, N., Vinitsky, E., Bayen, A., Russell,
S., Critch, A., and Levine, S. (2020). Emergent com-
plexity and zero-shot transfer via unsupervised envi-
ronment design. In Advances in Neural Information
Processing Systems, volume 33.
Dietterich, T. G. (2000). Hierarchical reinforcement learn-
ing with the maxq value function decomposition.
Journal of artificial intelligence research, 13:227–
303.
Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018).
Automatic goal generation for reinforcement learning
agents. In Proceedings of the 35th International Con-
ference on Machine Learning, pages 1515–1528.
Florensa, C., Held, D., Wulfmeier, M., Zhang, M., and
Abbeel, P. (2017). Reverse curriculum generation
for reinforcement learning. In Conference on Robot
Learning, pages 482–495.
Graves, A., Bellemare, M. G., Menick, J., Munos, R.,
and Kavukcuoglu, K. (2017). Automated curriculum
learning for neural networks. In International confer-
ence on machine learning, pages 1311–1320. PMLR.
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018).
Soft actor-critic: Off-policy maximum entropy deep
reinforcement learning with a stochastic actor. In-
ternational Conference on Machine Learning, pages
1861–1870.
Held, D., Geng, X., Florensa, C., and Abbeel, P. (2018).
Automatic goal generation for reinforcement learn-
ing agents. In International Conference on Machine
Learning, pages 1515–1528. PMLR.
Jiang, M., Dennis, M., Parker-Holder, J., Foerster, J.,
Grefenstette, E., and Rockt
¨
aschel, T. (2021a). Priori-
tized level replay.
Jiang, M., Dennis, M., Parker-Holder, J., Foerster, J.,
Grefenstette, E., and Rockt
¨
aschel, T. (2021b). Replay-
guided adversarial environment design. In Ad-
vances in Neural Information Processing Systems,
volume 34, pages 1884–1897.
Juliani, A., Khalifa, A., Berges, V.-P., Harper, J., Teng, E.,
Henry, H., Crespi, A., Togelius, J., and Lange, D.
(2019). Obstacle tower: A generalization challenge
in vision, control, and planning. In Proceedings of
the Twenty-Eighth International Joint Conference on
Artificial Intelligence, pages 2684–2691. IJCAI.
Kingma, D. P. and Welling, M. (2013). Auto-encoding vari-
ational bayes. arXiv preprint arXiv:1312.6114.
Kirk, R., Zhang, A., Grefenstette, E., and Rockt
¨
aschel, T.
(2021). A survey of generalisation in deep reinforce-
ment learning. arXiv preprint arXiv:2111.09794.
Klink, P., D’Eramo, C., Peters, J., and Pajarinen, J.
(2020). Self-paced deep reinforcement learning. In
Advances in Neural Information Processing Systems,
volume 33. Curran Associates, Inc.
Klink, P., Yang, H., D’Eramo, C., Pajarinen, J., and Pe-
ters, J. (2022). Curriculum reinforcement learning via
constrained optimal transport. In Proceedings of the
39th International Conference on Machine Learning,
pages 11325–11344. PMLR.
Kulkarni, T. D., Narasimhan, K., Saeedi, A., and Tenen-
baum, J. (2016). Hierarchical deep reinforcement
learning: Integrating temporal abstraction and intrin-
sic motivation. Advances in neural information pro-
cessing systems, 29.
Li, Y. (2018). Deep reinforcement learning: An overview.
arXiv preprint arXiv:1701.07274.
Machado, M. C., Bellemare, M. G., Talvitie, E., Veness, J.,
Hausknecht, M., and Bowling, M. (2018). Revisiting
the arcade learning environment: Evaluation protocols
and open problems for general agents. Journal of Ar-
tificial Intelligence Research, 61:523–562.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fid-
jeland, A. K., Ostrovski, G., et al. (2015). Human-
level control through deep reinforcement learning.
Nature, 518(7540):529–533.
Moulin-Frier, C., Nguyen, S. M., and Oudeyer, P.-Y. (2014).
Self-organization of early vocal development in in-
fants and machines: The role of intrinsic motivation.
Frontiers in Psychology, 5:1065.
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Tay-
lor, M. E., and Stone, P. (2020). Curriculum learn-
ing for reinforcement learning domains: A framework
and survey. Journal of Machine Learning Research,
21(181):1–50.
OpenAI, Andrychowicz, M., Baker, B., Chociej, M., Joze-
fowicz, R., McGrew, B., Pachocki, J., Petron, A.,
Plappert, M., Powell, G., Ray, A., et al. (2020). Learn-
ing dexterous in-hand manipulation. The Interna-
tional Journal of Robotics Research, 39(1):3–20.
Osband, I., Doron, Y., Hessel, M., Aslanides, J., Sezener,
E., Saraiva, A., McKinney, K., Lattimore, T.,
Szepesv
´
ari, C., Singh, S., Van Roy, B., Sutton, R. S.,
Silver, D., and van Hasselt, H. (2020). Behaviour suite
for reinforcement learning. In 8th International Con-
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
142