MEFA: Multimodal Image Early Fusion with Attention Module for Pedestrian and Vehicle Detection

Yoann Dupas; Yoann Dupas; Olivier Hotel; Grégoire Lefebvre; Christophe Cérin; Christophe Cérin

doi:10.5220/0013236000003912

MEFA: Multimodal Image Early Fusion with Attention Module for Pedestrian and Vehicle Detection

Yoann Dupas, Yoann Dupas, Olivier Hotel, Grégoire Lefebvre, Christophe Cérin, Christophe Cérin

2025

Abstract

Pedestrian and vehicle detection represents a significant challenge in autonomous driving, particularly in adverse weather conditions. Multimodal image fusion addresses this challenge. This paper proposes a new early-fusion attention-based approach from visible, infrared, and LiDAR images, designated as MEFA (Multi-modal image Early Fusion with Attention). In this study, we compare our MEFA proposal with a channel-wise concatenation early-fusion approach. When coupled with YOLOv8 or RT-DETRv1 for pedestrian and vehicle detection, our contribution is promising in adverse weather conditions (i.e. rainy days or foggy nights). Furthermore, our MEFA proposal demonstrated superior mAP accuracy on the DENSE dataset.

Download

Paper Citation

in Harvard Style

Dupas Y., Hotel O., Lefebvre G. and Cérin C. (2025). MEFA: Multimodal Image Early Fusion with Attention Module for Pedestrian and Vehicle Detection. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 610-617. DOI: 10.5220/0013236000003912

in Bibtex Style

@conference{visapp25,
author={Yoann Dupas and Olivier Hotel and Grégoire Lefebvre and Christophe Cérin},
title={MEFA: Multimodal Image Early Fusion with Attention Module for Pedestrian and Vehicle Detection},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2025},
pages={610-617},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013236000003912},
isbn={978-989-758-728-3},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - MEFA: Multimodal Image Early Fusion with Attention Module for Pedestrian and Vehicle Detection
SN - 978-989-758-728-3
AU - Dupas Y.
AU - Hotel O.
AU - Lefebvre G.
AU - Cérin C.
PY - 2025
SP - 610
EP - 617
DO - 10.5220/0013236000003912
PB - SciTePress