
transformers and advanced transformer-based rein-
forcement learning methods for autonomous driv-
ing control. This entails replacing the current Vari-
ational Autoencoder with architectures like Vision
Transformers (ViT, Swin Transformer, ConvNeXT)
tailored for raw visual data. Furthermore, newer
techniques such as Decision Transformers or Trajec-
tory Transformers could replace the Proximal Policy
Optimization (PPO) algorithm to potentially enhance
decision-making capabilities. Another promising area
for future research is Multi-Objective Reinforcement
Learning (MORL) (Van Moffaert and Now
´
e, 2014;
Hayes et al., 2021; Liu et al., 2015), where an agent
optimizes multiple reward functions, each represent-
ing different objectives. Evaluating these advance-
ments through simulated testing may lead to substan-
tial performance improvements.
REFERENCES
Baldi, P. (2011). Autoencoders, unsupervised learning, and
deep architectures. In ICML Unsupervised and Trans-
fer Learning.
Bengio, Y., Louradour, J., Collobert, R., and Weston, J.
(2009). Curriculum learning. volume 60, page 6.
Dickmanns, E. D. and Zapp, B. (1987). An integrated dy-
namic scene analysis system for autonomous road ve-
hicles. In Intelligent Vehicles ’87, pages 157–164.
IEEE.
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., and
Koltun, V. (2017). Carla: An open urban driving sim-
ulator.
Grigorescu, S., Trasnea, B., Cocias, T., and Macesanu,
G. (2019). A survey of deep learning techniques
for autonomous driving. Journal of Field Robotics,
37(3):362–386.
Hayes, C. F., Radulescu, R., Bargiacchi, E., K
¨
allstr
¨
om,
J., Macfarlane, M., Reymond, M., Verstraeten, T.,
Zintgraf, L. M., Dazeley, R., Heintz, F., Howley,
E., Irissappane, A. A., Mannion, P., Now
´
e, A.,
de Oliveira Ramos, G., Restelli, M., Vamplew, P., and
Roijers, D. M. (2021). A practical guide to multi-
objective reinforcement learning and planning. CoRR,
abs/2103.09568.
Kendall, A., Hawke, J., Janz, D., Mazur, P., Reda, D., Allen,
J.-M., Lam, V.-D., Bewley, A., and Shah, A. (2018).
Learning to drive in a day.
Kingma, D. P. and Welling, M. (2022). Auto-encoding vari-
ational bayes.
Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). Im-
agenet classification with deep convolutional neural
networks. Neural Information Processing Systems,
25.
Kullback, S. and Leibler, R. A. (1951). On information
and sufficiency. The Annals of Mathematical Statis-
tics, 22(1):79–86.
Li, Y. and Ibanez-Guzman, J. (2020). Lidar for autonomous
driving: The principles, challenges, and trends for au-
tomotive lidar and perception systems. IEEE Signal
Processing Magazine, 37(4):50–61.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T.,
Tassa, Y., Silver, D., and Wierstra, D. (2019). Contin-
uous control with deep reinforcement learning.
Liu, C., Xu, X., and Hu, D. (2015). Multiobjective rein-
forcement learning: A comprehensive overview. IEEE
Transactions on Systems, Man, and Cybernetics: Sys-
tems, 45(3):385–398.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap,
T. P., Harley, T., Silver, D., and Kavukcuoglu, K.
(2016). Asynchronous methods for deep reinforce-
ment learning.
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A.,
Antonoglou, I., Wierstra, D., and Riedmiller, M.
(2013). Playing atari with deep reinforcement learn-
ing.
Moravec, H. (1990). Sensor fusion in autonomous vehicles.
In Sensor Fusion, pages 125–153. Springer, Boston,
MA.
Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor,
M. E., and Stone, P. (2020). Curriculum learning for
reinforcement learning domains: A framework and
survey.
Pomerleau, D. A. (1988). Alvinn: An autonomous land
vehicle in a neural network. In Touretzky, D., editor,
Advances in Neural Information Processing Systems,
volume 1. Morgan-Kaufmann.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).
Learning internal representations by error propaga-
tion.
Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and
Abbeel, P. (2017a). Trust region policy optimization.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and
Klimov, O. (2017b). Proximal policy optimization al-
gorithms.
Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y.
(1999). Policy gradient methods for reinforcement
learning with function approximation. In Solla, S.,
Leen, T., and M
¨
uller, K., editors, Advances in Neu-
ral Information Processing Systems, volume 12. MIT
Press.
Thrun, S., Burgard, W., and Fox, D. (2005). Probabilis-
tic Robotics (Intelligent Robotics and Autonomous
Agents).
Van Moffaert, K. and Now
´
e, A. (2014). Multi-objective re-
inforcement learning using sets of pareto dominating
policies. The Journal of Machine Learning Research,
15(1):3483–3512.
Vergara, M. L. (2019). Accelerating training of deep rein-
forcement learning-based autonomous driving agents
through comparative study of agent and environment
designs. Master thesis, NTNU.
Wikipedia contributors (2024). Darpa grand challenge
(2005) — Wikipedia, the free encyclopedia.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
442