Scaling Multi-Frame Transformers for End-to-End Driving
Vasileios Kochliaridis, Filippos Moumtzidellis, Ioannis Vlahavas
2025
Abstract
Vision-based end-to-end controllers hold the potential to revolutionize the production of Autonomous Vehicles by simplifying the implementation of navigation systems and reducing their development costs. However, the large-scale implementation of such controllers faces challenges, such as accurately estimating object trajectories and making robust real-time decisions. Advanced Deep Learning architectures combined with Imitation Learning provide a promising solution, allowing these controllers to learn from expert demonstrations to map observations directly to vehicle controls. Despite the progress, existing controllers still struggle with generalization and are difficult to train efficiently. In this paper, we introduce CILv3D, a novel video-based end-to-end controller that processes multi-view video frames and learns complex spatial-temporal features using attention mechanisms and 3D convolutions. We evaluate our approach by comparing its performance to the previous state-of-the-art and demonstrate significant improvements in the vehicle control accuracy. Our findings suggest that our approach could enhance the scalability and robustness of autonomous driving systems.
DownloadPaper Citation
in Harvard Style
Kochliaridis V., Moumtzidellis F. and Vlahavas I. (2025). Scaling Multi-Frame Transformers for End-to-End Driving. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 496-503. DOI: 10.5220/0013156700003890
in Bibtex Style
@conference{icaart25,
author={Vasileios Kochliaridis and Filippos Moumtzidellis and Ioannis Vlahavas},
title={Scaling Multi-Frame Transformers for End-to-End Driving},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={496-503},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013156700003890},
isbn={978-989-758-737-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Scaling Multi-Frame Transformers for End-to-End Driving
SN - 978-989-758-737-5
AU - Kochliaridis V.
AU - Moumtzidellis F.
AU - Vlahavas I.
PY - 2025
SP - 496
EP - 503
DO - 10.5220/0013156700003890
PB - SciTePress