Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models

Tin Sohn; Maximilian Dillitzer; Lukas Ewecker; Tim Brühl; Robin Schwager; Lena Dalke; Philip Elspas; Frank Oechsle; Eric Sax

doi:10.5220/0012738500003702

Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models

Tin Sohn, Maximilian Dillitzer, Lukas Ewecker, Tim Brühl, Robin Schwager, Lena Dalke, Philip Elspas, Frank Oechsle, Eric Sax

2024

Abstract

With the adoption of autonomous driving systems and scenario-based testing, there is a growing need for efficient methods to understand and retrieve driving scenarios from vast amounts of real-world driving data. As manual scenario selection is labor-intensive and limited in scalability, this study explores the use of three Large Vision-Language Models, CLIP, BLIP-2, and BakLLaVA, for scenario retrieval. The ability of the models to retrieve relevant scenarios based on natural language queries is evaluated using a diverse benchmark dataset of real-world driving scenarios and a precision metric. Factors such as scene complexity, weather conditions, and different traffic situations are incorporated into the method through the 6-Layer Model to measure the effectiveness of the models across different driving contexts. This study contributes to the understanding of the capabilities and limitations of Large Vision-Language Models in the context of driving scenario retrieval and provides implications for future research directions.

Download

Paper Citation

in Harvard Style

Sohn T., Dillitzer M., Ewecker L., Brühl T., Schwager R., Dalke L., Elspas P., Oechsle F. and Sax E. (2024). Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models. In Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS; ISBN 978-989-758-703-0, SciTePress, pages 496-505. DOI: 10.5220/0012738500003702

in Bibtex Style

@conference{vehits24,
author={Tin Sohn and Maximilian Dillitzer and Lukas Ewecker and Tim Brühl and Robin Schwager and Lena Dalke and Philip Elspas and Frank Oechsle and Eric Sax},
title={Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models},
booktitle={Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS},
year={2024},
pages={496-505},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012738500003702},
isbn={978-989-758-703-0},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Conference on Vehicle Technology and Intelligent Transport Systems - Volume 1: VEHITS
TI - Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models
SN - 978-989-758-703-0
AU - Sohn T.
AU - Dillitzer M.
AU - Ewecker L.
AU - Brühl T.
AU - Schwager R.
AU - Dalke L.
AU - Elspas P.
AU - Oechsle F.
AU - Sax E.
PY - 2024
SP - 496
EP - 505
DO - 10.5220/0012738500003702
PB - SciTePress