Authors:
Masaki Nambata
1
;
Tsubasa Hirakawa
1
;
Takayoshi Yamashita
1
;
Hirobobu Fujiyoshi
1
;
Takehito Teraguchi
2
;
Shota Okubo
2
and
Takuya Nanri
2
Affiliations:
1
Chubu University, 1200 Matsumoto-cho Kasugai, Aichi, Japan
;
2
Nissan Motor Co., Ltd., 2 Takara-cho Kanawgawa-ku Yokohama-shi, Kanagawa, Japan
Keyword(s):
Driver’s Assistance System, Vision and Language Model, Evaluation Method.
Abstract:
In the field of Advanced Driver Assistance Systems (ADAS), car navigation systems have become an essential part of modern driving. However, the guidance provided by existing car navigation systems is often difficult to understand, making it difficult for drivers to understand solely through voice instructions. This challenge has led to growing interest in Human-like Guidance (HLG), a task focused on delivering intuitive navigation instructions that mimic the way a passenger would guide a driver. Despite this, previous studies have used rule-based systems to generate HLG datasets, which have resulted in inflexible and low-quality due to limited textual representation. In contrast, high-quality datasets are crucial for improving model performance. In this study, we propose a method to automatically generate high-quality navigation sentences from image data using a Large Language Model with a novel prompting approach. Additionally, we introduce a Mixture of Experts (MoE) framework for d
ata cleaning to filter out unreliable data. The resulting dataset is both expressive and consistent. Furthermore, our proposed MoE evaluation framework makes it possible to perform appropriate evaluation from multiple perspectives, even for complex tasks such as HLG.
(More)