BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation

Hiep Truong Cong, Hiep Truong Cong, Ajay Kumar Sigatapu, Arindam Das, Arindam Das, Yashwanth Sharma, Venkatesh Satagopan, Ganesh Sistu, Ganesh Sistu, Ciarán Eising

2025

Abstract

Accurate motion understanding of the dynamic objects within the scene in bird’s-eye-view (BEV) is critical to ensure a reliable obstacle avoidance system and smooth path planning for autonomous vehicles. However, this task has received relatively limited exploration when compared to object detection and segmentation with only a few recent vision-based approaches presenting preliminary findings that significantly deteriorate in lowlight, nighttime, and adverse weather conditions such as rain. Conversely, LiDAR and radar sensors remain almost unaffected in these scenarios, and radar provides key velocity information of the objects. Therefore, we introduce BEVMOSNet, to our knowledge, the first end-to-end multimodal fusion leveraging cameras, LiDAR, and radar to precisely predict the moving objects in BEV. In addition, we perform a deeper analysis to find out the optimal strategy for deformable cross-attention-guided sensor fusion for cross-sensor knowledge sharing in BEV. While evaluating BEVMOSNet on the nuScenes dataset, we show an overall improvement in IoU score of 36.59% compared to the vision-based unimodal baseline BEV-MoSeg (Sigatapu et al., 2023), and 2.35% compared to the multimodel SimpleBEV (Harley et al., 2022), extended for the motion segmentation task, establishing this method as the state-of-the-art in BEV motion segmentation.

Download


Paper Citation


in Harvard Style

Cong H., Sigatapu A., Das A., Sharma Y., Satagopan V., Sistu G. and Eising C. (2025). BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation. In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-728-3, SciTePress, pages 863-872. DOI: 10.5220/0013383300003912


in Bibtex Style

@conference{visapp25,
author={Hiep Cong and Ajay Sigatapu and Arindam Das and Yashwanth Sharma and Venkatesh Satagopan and Ganesh Sistu and Ciarán Eising},
title={BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation},
booktitle={Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2025},
pages={863-872},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013383300003912},
isbn={978-989-758-728-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation
SN - 978-989-758-728-3
AU - Cong H.
AU - Sigatapu A.
AU - Das A.
AU - Sharma Y.
AU - Satagopan V.
AU - Sistu G.
AU - Eising C.
PY - 2025
SP - 863
EP - 872
DO - 10.5220/0013383300003912
PB - SciTePress