3 EXPERIMENTAL RESULTS
3.1 Dataset and Results
Experiments were performed using the KITTI
Dataset, which is widely used in 3D object detection
research, was used (Geiger, 2012).
The KITTI Dataset consists of 7481 images with
9 classes. We used 3 major classes (car, pedestrian,
and cyclist) for autonomous driving. We used 70% of
the KITTI dataset for training and the remaining 30%
for validation. Although the complex YOLO
(YOLOv2) was used, the prediction model was
designed using YOLOv4 (Bochkovskiy, 2020).
Tables 1-2 show the performance comparison
between the proposed algorithm and the complex
YOLO algorithm in terms of AOS (Average
Orientation Similarity) (Geiger, 2012) and AP
(Average Precision) (Everingham, 2010; Geiger,
2012). It is noted that both AP and AOS metrics
consider the result is correct if a predicted box
overlaps by at least 50% with a ground truth bounding
box. Thus, the metrics of Tables 1-2 may not fully
reflect more accurate bounding box estimations of the
proposed method.
Fig.10 shows some prediction output images
obtained by applying the proposed method and the
complex YOLO algorithm. Green cuboids represent
people, and red cuboids represent vehicles. It can be
seen that the proposed algorithm noticeably reduced
false positive errors.
Table 1: Performance comparison (AOS).
Model Car Pedestrian Cyclist
complex YOLO 0.729 0.406 0.573
Proposed 0.730 0.418 0.579
Table 2: Performance comparison (AP).
Model Car Pedestrian Cyclist
complex YOLO 0.780 0.413 0.582
Proposed 0.782 0.425 0.588
3.2 Instance Segmentation
Since the proposed method based on 3D masks can
produce accurate 3D boundaries, we can generate
accurate instance segmentation, whereas the
conventional methods can only produce 3D bounding
boxes (cuboids) that provide approximate 3D
locations of target objects. Fig. 11 shows some
instance segmentation results of the proposed
method.
4 CONCLUSIONS
The LiDAR sensor can provide important
information for 3D object detection in autonomous
driving methods. Using the LiDAR sensor, one can
overcome the reliability issues of vision-based
objection methods. However, 3D object detection
methods based on BEV images of LiDAR data have
some other problems such as inaccurate ground ROI
estimation and false positive errors. We propose to
use a 3D shape loss function based on 3D masks for
three major targets. Although experimental results
show some promising results, one can improve the
performance by using more diverse 3D masks.
ACKNOWLEDGEMENTS
This research was supported in part by Basic Science
Research Program through the National Research
Foundation of Korea (NRF) funded by the Ministry
of Education, Science and Technology (NRF-
2020R1A2C1012221).
REFERENCES
Bochkovskiy, A., Wang, C. Y., & Liao, H. Y. M. (2020).
Yolov4: Optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934.
Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-
view 3d object detection network for autonomous
driving. In Proceedings of the IEEE conference on
Computer Vision and Pattern Recognition (pp. 1907-
1915).
Everingham, M., Van Gool, L., Williams, C. K., Winn, J.,
& Zisserman, A. (2010). The pascal visual object
classes (voc) challenge. International journal of
computer vision, 88(2), 303-338.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready
for autonomous driving? the KITTI vision benchmark
suite. In 2012 IEEE conference on computer vision and
pattern recognition (pp. 3354-3361).
Jeong, Y. (2021). Predictive lane change decision making
using bidirectional long shot-term memory for
autonomous driving on highways. IEEE Access, 9,
144985-144998.
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., &
Beijbom, O. (2019). Pointpillars: Fast encoders for
object detection from point clouds. In Proceedings of