dactic and research RL applications. The very near
future development plans for AIM-RL include the
extension to allow as an option Deep Q-Learning
(Franc¸ois-Lavet et al., 2018) as well as alternatives
to the ε-greedy selection such as Randomised Proba-
bility Matching (Scott, 2010). As a later development
we plan to facilitate the usage of AIM-RL as a sup-
port framework for e-learning by adding a graphical
UI and a validator for implemented models.
REFERENCES
Brockman, G., Cheung, V., Pettersson, L., Schneider, J.,
Schulman, J., Tang, J., and Zaremba, W. (2016). Ope-
nai gym. arXiv preprint arXiv:1606.01540.
Chen, J., Yuan, B., and Tomizuka, M. (2019). Model-free
deep reinforcement learning for urban autonomous
driving. In IEEE intelligent transportation systems
conference (ITSC), pages 2765–2771. IEEE.
Duan, Y., Chen, X., Houthooft, R., Schulman, J., and
Abbeel, P. (2016). Benchmarking deep reinforce-
ment learning for continuous control. In International
conference on machine learning, pages 1329–1338.
PMLR.
Franc¸ois-Lavet, V., Henderson, P., Islam, R., Bellemare,
M. G., Pineau, J., et al. (2018). An introduction
to deep reinforcement learning. Foundations and
Trends® in Machine Learning, 11(3-4):219–354.
He, X., Zhao, K., and Chu, X. (2021). Automl: A sur-
vey of the state-of-the-art. Knowledge-Based Systems,
212:106622.
Heidrich-Meisner, V. and Igel, C. (2008). Variable metric
reinforcement learning methods applied to the noisy
mountain car problem. In Recent Advances in Rein-
forcement Learning: 8th European Workshop, EWRL
2008, Villeneuve d’Ascq, France, June 30-July 3,
2008, Revised and Selected Papers 8, pages 136–150.
Springer.
Johnson, W. W., Story, W. E., et al. (1879). Notes on
the “15” puzzle. American Journal of Mathematics,
2(4):397–404.
Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Camp-
bell, R. H., Czechowski, K., Erhan, D., Finn, C.,
Kozakowski, P., Levine, S., et al. (2019). Model-
based reinforcement learning for atari. arXiv preprint
arXiv:1903.00374.
Lai, K.-H., Zha, D., Li, Y., and Hu, X. (2020). Dual policy
distillation. arXiv preprint arXiv:2006.04061.
Moerland, T. M., Broekens, J., Plaat, A., Jonker, C. M.,
et al. (2023). Model-based reinforcement learning: A
survey. Foundations and Trends® in Machine Learn-
ing, 16(1):1–118.
Nelson, M. J. and Hoover, A. K. (2020). Notes on using
google colaboratory in ai education. In Proceedings
of the ACM conference on innovation and Technology
in Computer Science Education, pages 533–534.
Paduraru., C., Paduraru., M., and Iordache., S. (2022). Us-
ing deep reinforcement learning to build intelligent
tutoring systems. In Proceedings of the 17th Inter-
national Conference on Software Technologies, pages
288–298. INSTICC, SciTePress.
Piltaver, R., Lu
ˇ
strek, M., and Gams, M. (2012). The
pathology of heuristic search in the 8-puzzle. Journal
of Experimental & Theoretical Artificial Intelligence,
24(1):65–94.
Ratner, D. and Warmuth, M. K. (1986). Finding a short-
est solution for the n× n extension of the 15-puzzle is
intractable. In AAAI, volume 86, pages 168–172.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017). Proximal policy optimization al-
gorithms. arXiv preprint arXiv:1707.06347.
Scott, S. L. (2010). A modern bayesian look at the multi-
armed bandit. Applied Stochastic Models in Business
and Industry, 26(6):639–658.
Sutton, R. S. (1995). Generalization in reinforcement learn-
ing: Successful examples using sparse coarse coding.
Advances in neural information processing systems, 8.
Watkins, C. J. C. H. (1989). Learning from delayed rewards.
PhD thesis, King’s College, Cambridge United King-
dom.
Yarats, D., Zhang, A., Kostrikov, I., Amos, B., Pineau, J.,
and Fergus, R. (2021). Improving sample efficiency
in model-free reinforcement learning from images. In
Proceedings of the AAAI Conference on Artificial In-
telligence, volume 35, no 12, pages 10674–10681.
Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn,
C., and Levine, S. (2020). Meta-world: A benchmark
and evaluation for multi-task and meta reinforcement
learning. In Conference on robot learning, pages
1094–1100. PMLR.
AIM-RL: A New Framework Supporting Reinforcement Learning Experiments
419