Study of LiDAR Segmentation and Model's Uncertainty using Transformer for Different Pre-trainings

Mohammed Hassoubah; Mohammed Hassoubah; Ibrahim Sobh; Mohamed Elhelw

doi:10.5220/0010969700003124

Study of LiDAR Segmentation and Model's Uncertainty using Transformer for Different Pre-trainings

Mohammed Hassoubah, Mohammed Hassoubah, Ibrahim Sobh, Mohamed Elhelw

2022

Abstract

For the task of semantic segmentation of 2D or 3D inputs, Transformer architecture suffers limitation in the ability of localization because of lacking low-level details. Also for the Transformer to function well, it has to be pre-trained first. Still pre-training the Transformer is an open area of research. In this work, Transformer is integrated into the U-Net architecture as (Chen et al., 2021). The new architecture is trained to conduct semantic segmentation of 2D spherical images generated from projecting the 3D LiDAR point cloud. Such integration allows capturing the the local dependencies from CNN backbone processing of the input, followed by Transformer processing to capture the long range dependencies. To define the best pre-training settings, multiple ablations have been executed to the network architecture, the self-training loss function and self-training procedure, and results are observed. It’s proved that, the integrated architecture and self-training improve the mIoU by +1.75% over U-Net architecture only, even with self-training it too. Corrupting the input and self-train the network for reconstruction of the original input improves the mIoU by highest difference = 2.9% over using reconstruction plus contrastive training objective. Self-training the model improves the mIoU by 0.48% over initialising with imageNet pre-trained model even with self-training the pre-trained model too. Random initialisation of the Batch Normalisation layers improves the mIoU by 2.66% over using selftrained parameters. Self supervision training of the segmentation network reduces the model’s epistemic uncertainty. The integrated architecture and self-training outperformed the SalsaNext (Cortinhal et al., 2020) (to our knowledge it’s the best projection based semantic segmentation network) by 5.53% higher mIoU, using the SemanticKITTI (Behley et al., 2019) validation dataset with 2D input dimension 1024×64.

Download

Paper Citation

in Harvard Style

Hassoubah M., Sobh I. and Elhelw M. (2022). Study of LiDAR Segmentation and Model's Uncertainty using Transformer for Different Pre-trainings. In Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP; ISBN 978-989-758-555-5, SciTePress, pages 1010-1019. DOI: 10.5220/0010969700003124

in Bibtex Style

@conference{visapp22,
author={Mohammed Hassoubah and Ibrahim Sobh and Mohamed Elhelw},
title={Study of LiDAR Segmentation and Model's Uncertainty using Transformer for Different Pre-trainings},
booktitle={Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP},
year={2022},
pages={1010-1019},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010969700003124},
isbn={978-989-758-555-5},
}

in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2022) - Volume 4: VISAPP
TI - Study of LiDAR Segmentation and Model's Uncertainty using Transformer for Different Pre-trainings
SN - 978-989-758-555-5
AU - Hassoubah M.
AU - Sobh I.
AU - Elhelw M.
PY - 2022
SP - 1010
EP - 1019
DO - 10.5220/0010969700003124
PB - SciTePress