block (1D-SAB), which is a 1D version of the SAB
(Zhao et al., 2020) in SalsaNext. With 1D-SalsaSAN,
self-attention of the point-cloud data of each laser
ID acquired from LiDAR is calculated in order to
take into account the relationship between point
clouds. This enables the processing to be adapted to
the characteristics of the point-cloud data acquired
from LiDAR. By expressing the detailed relationship
between each point cloud as weights, we can improve
the identification accuracy for small objects such
as motorcycles and signs, which are not identified
accurately with conventional semantic-segmentation
methods. In addition, a projection method called
Scan-Unfolding (Triess et al., 2020) is also used
to obtain pseudo-images from 3D point-cloud data.
This suppresses the loss of 3D point cloud data
when converting them to pseudo-images and enables
feature extraction close to the original point-cloud
information. The results of evaluation experiments
using SemanticKITTI (Behley et al., 2019) indicates
that 1D-SalsaSAN improves the accuracy of semantic
segmentation by projection using Scan-Unfolding
and then processing with 1D-SAB. We confirmed that
it contributes to the improvement of the identification
accuracy of small objects. We also showed that its
processing speed is faster than that of SalsaNext.
2 RELATED WORK
Studies on omnidirectional LiDAR-based deep learn-
ing robust to nighttime and bad weather conditions,
under which objects are difficult to detect with image-
based methods, have been conducted, and many
methods have been proposed. As mentioned above,
methods using 3D point-cloud data can be categorized
as those for converting 3D point-cloud data into vox-
els, using 3D point-cloud data without any conversion
process, and converting 3D point-cloud data into 2D
pseudo-images. They differ in the way they represent
the point cloud. In this section, each type and the typ-
ical methods are described.
2.1 Voxel-Based Methods
Voxel-based methods first converts a 3D point-cloud
data as a voxels. The voxelized point-cloud data are
then input to a network consisting of 3D convolutions
to obtain results. VoxelNet (Zhou and Tuzel, 2018)
is a object detection method from the 3D point-cloud
data divided into voxels. VoxelNet contains a feature
learning network (FLN). In FLN, the 3D information
is divided into equally spaced voxels, and the shape
information in each voxel is obtained. The feature
values of each point in the voxel are also calculated
and combined with the feature values of each voxel.
The feature values of each point is then used for fea-
ture extraction and output object regions.
Voxel-based methods make it easy to retain the
original information of a 3D point-cloud data and
smooth feature extraction by 3D convolution is possi-
ble. They also improve on the sparseness of 3D point-
cloud data by grouping them by voxel, making them
easier to handle for each task. However, due to the
cubical representation of voxel data, this is computa-
tionally expensive and decreases the process speed.
2.2 Point-Wise Methods
With methods for using acquired 3D point-cloud-
based data without any conversion process, a point
cloud is directly input to a network for processing (Qi
et al., 2017a; Qi et al., 2017b). The (x, y, z) coordi-
nate information and the reflection intensity values of
point clouds are input to a network.
PointNet (Qi et al., 2017a) can be applied to sev-
eral tasks such as three-class classification and seg-
mentation. It is composed of a spatial transformer
network (STN), classification network, and segmen-
tation network. First, we reduce the noise for the in-
put point cloud in the STN. The next step is to ex-
tract the features of each point cloud from the con-
volution process by using the classification network.
Max pooling is then used to extract the overall fea-
tures and classify them. For segmentation, the overall
features extracted in the classification network and the
local features of each point cloud are combined and
input to the segmentation network. The convolution
process is executed several times again, and segmen-
tation is executed for each point cloud. PointNet may
lack detailed spatial information, thus may not be able
to capture the local structure. Threfore, PointNet++
(Qi et al., 2017b) have been proposed to solve this
problem, which applies the PointNet process hierar-
chically. It is also possible to extract pseudo-local fea-
tures by inputting neighboring points that have been
clustered. This solves the problems with PointNet and
improves the accuracy of class classification and seg-
mentation.
Thus, the original information of a 3D point-cloud
data is retained, and accurate feature extraction is pos-
sible. These methods also eliminates the computa-
tional cost of converting to voxels, etc. However,
processing 3D point-cloud data as they are requires
a huge amount of storage space. The associated com-
putational cost of processing point-cloud data is also
high, which may result in a reduction in processing
speed.
VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications
446