ACKNOWLEDGEMENTS
This work was funded by the Artificial and Natural
Intelligence Toulouse Institute (ANITI) - Institut 3iA
(ANR-19-PI3A-0004).
REFERENCES
Angelotti, G., Drougard, N., and Chanel, C. P. C. (2020).
Offline learning for planning: A summary. In Proceed-
ings of the 1st Workshop on Bridging the Gap Between
AI Planning and Reinforcement Learning (PRL) at the
30th International Conference on Automated Planning
and Scheduling, pages 153–161.
Angelotti, G., Drougard, N., and Chanel, C. P. C. (2022).
Expert-guided symmetry detection in markov decision
processes. In Proceedings of the 14th International
Conference on Agents and Artificial Intelligence - Vol-
ume 2: ICAART,, pages 88–98. INSTICC, SciTePress.
Bellman, R. (1966). Dynamic Programming. Science,
153(3731):34–37.
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nAI Gym. arXiv preprint arXiv:1606.01540.
Dean, T. and Givan, R. (1997). Model Minimization in
Markov Decision Processes. In AAAI/IAAI, pages 106–
111.
Dinh, L., Krueger, D., and Bengio, Y. (2015). NICE: Non-
linear Independent Components Estimation. In Bengio,
Y. and LeCun, Y., editors, 3rd International Confer-
ence on Learning Representations, ICLR 2015, San
Diego, CA, USA, May 7-9, 2015, Workshop Track Pro-
ceedings.
Grathwohl, W., Chen, R. T. Q., Bettencourt, J., Sutskever, I.,
and Duvenaud, D. (2019). FFJORD: Free-Form Con-
tinuous Dynamics for Scalable Reversible Generative
Models. In 7th International Conference on Learning
Representations, ICLR 2019, New Orleans, LA, USA,
May 6-9, 2019. OpenReview.net.
Gross, D. J. (1996). The role of symmetry in fundamen-
tal physics. Proceedings of the National Academy of
Sciences, 93(25):14256–14259.
Kobyzev, I., Prince, S., and Brubaker, M. (2020). Normal-
izing Flows: An Introduction and Review of Current
Methods. IEEE Transactions on Pattern Analysis and
Machine Intelligence.
Kumar, A., Zhou, A., Tucker, G., and Levine, S. (2020). Con-
servative q-learning for offline reinforcement learning.
Advances in Neural Information Processing Systems,
33:1179–1191.
Levine, S., Kumar, A., Tucker, G., and Fu, J. (2020). Offline
reinforcement learning: Tutorial, review, and perspec-
tives on open problems. ArXiv, abs/2005.01643.
Li, L., Walsh, T. J., and Littman, M. L. (2006). Towards a
Unified Theory of State Abstraction for MDPs. ISAIM,
4:5.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness,
J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidje-
land, A. K., Ostrovski, G., et al. (2015). Human-level
control through deep reinforcement learning. Nature,
518(7540):529–533.
Narayanamurthy, S. M. and Ravindran, B. (2008). On the
Hardness of Finding Symmetries in Markov Decision
Processes. In Proceedings of the 25th international
conference on Machine learning, pages 688–695.
Paine, T. L., Paduraru, C., Michi, A., Gulcehre, C., Zolna, K.,
Novikov, A., Wang, Z., and de Freitas, N. (2020). Hy-
perparameter selection for offline reinforcement learn-
ing. arXiv preprint arXiv:2007.09055.
Papamakarios, G., Pavlakou, T., and Murray, I. (2017).
Masked autoregressive flow for density estimation.
In Proceedings of the 31st International Conference
on Neural Information Processing Systems, NIPS’17,
page 2335–2344, Red Hook, NY, USA. Curran Asso-
ciates Inc.
Park, D. S., Chan, W., Zhang, Y., Chiu, C.-C., Zoph, B.,
Cubuk, E. D., and Le, Q. V. (2019). Specaugment: A
simple data augmentation method for automatic speech
recognition. In Proc. Interspeech 2019, pages 2613–
2617.
Ravindran, B. and Barto, A. G. (2001). Symmetries and
Model Minimization in Markov Decision Processes.
Technical report, USA.
Ravindran, B. and Barto, A. G. (2004). Approximate Homo-
morphisms: A Framework for Non-exact Minimization
in Markov Decision Processes.
Seno, T. and Imai, M. (2021). d3rlpy: An offline deep
reinforcement library. In NeurIPS 2021 Offline Rein-
forcement Learning Workshop.
Shorten, C. and Khoshgoftaar, T. M. (2019). A survey on
Image Data Augmentation for Deep Learning. Journal
of Big Data, 6(1):1–48.
van der Pol, E., Kipf, T., Oliehoek, F. A., and Welling,
M. (2020a). Plannable Approximations to MDP Ho-
momorphisms: Equivariance under Actions. In Pro-
ceedings of the 19th International Conference on Au-
tonomous Agents and MultiAgent Systems, AAMAS
’20, page 1431–1439, Richland, SC. International
Foundation for Autonomous Agents and Multiagent
Systems.
van der Pol, E., Worrall, D., van Hoof, H., Oliehoek, F.,
and Welling, M. (2020b). MDP Homomorphic Net-
works: Group Symmetries in Reinforcement Learning.
In Larochelle, H., Ranzato, M., Hadsell, R., Balcan,
M. F., and Lin, H., editors, Advances in Neural Infor-
mation Processing Systems, volume 33, pages 4199–
4210. Curran Associates, Inc.
van Dyk, D. A. and Meng, X.-L. (2001). The art of data aug-
mentation. Journal of Computational and Graphical
Statistics, 10(1):1–50.
Yarats, D., Brandfonbrener, D., Liu, H., Laskin, M., Abbeel,
P., Lazaric, A., and Pinto, L. (2022). Don’t change the
algorithm, change the data: Exploratory data for offline
reinforcement learning. In ICLR 2022 Workshop on
Generalizable Policy Learning in Physical World.
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
124