Scaling Multi-Frame Transformers for End-to-End Driving

Vasileios Kochliaridis, Filippos Moumtzidellis, Ioannis Vlahavas

2025

Abstract

Vision-based end-to-end controllers hold the potential to revolutionize the production of Autonomous Vehicles by simplifying the implementation of navigation systems and reducing their development costs. However, the large-scale implementation of such controllers faces challenges, such as accurately estimating object trajectories and making robust real-time decisions. Advanced Deep Learning architectures combined with Imitation Learning provide a promising solution, allowing these controllers to learn from expert demonstrations to map observations directly to vehicle controls. Despite the progress, existing controllers still struggle with generalization and are difficult to train efficiently. In this paper, we introduce CILv3D, a novel video-based end-to-end controller that processes multi-view video frames and learns complex spatial-temporal features using attention mechanisms and 3D convolutions. We evaluate our approach by comparing its performance to the previous state-of-the-art and demonstrate significant improvements in the vehicle control accuracy. Our findings suggest that our approach could enhance the scalability and robustness of autonomous driving systems.

Download


Paper Citation


in Harvard Style

Kochliaridis V., Moumtzidellis F. and Vlahavas I. (2025). Scaling Multi-Frame Transformers for End-to-End Driving. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 496-503. DOI: 10.5220/0013156700003890


in Bibtex Style

@conference{icaart25,
author={Vasileios Kochliaridis and Filippos Moumtzidellis and Ioannis Vlahavas},
title={Scaling Multi-Frame Transformers for End-to-End Driving},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={496-503},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013156700003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Scaling Multi-Frame Transformers for End-to-End Driving
SN - 978-989-758-737-5
AU - Kochliaridis V.
AU - Moumtzidellis F.
AU - Vlahavas I.
PY - 2025
SP - 496
EP - 503
DO - 10.5220/0013156700003890
PB - SciTePress