Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017). Multi-
view 3d object detection network for autonomous
driving. In CVPR, pages 1907–1915.
Chen, Y., Liu, S., Shen, X., and Jia, J. (2019). Fast point
r-cnn. In ICCV, pages 9775–9784.
Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., and
Li, H. (2020). Voxel r-cnn: Towards high perfor-
mance voxel-based 3d object detection. arXiv preprint
arXiv:2012.15712.
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for autonomous driving? the kitti vision benchmark
suite. In CVPR, pages 3354–3361. IEEE.
Glorot, X. and Bengio, Y. (2010). Understanding the diffi-
culty of training deep feedforward neural networks. In
AISTATS, pages 249–256. JMLR Workshop and Con-
ference Proceedings.
Graham, B. (2014). Spatially-sparse convolutional neural
networks. arXiv preprint arXiv:1409.6070.
Graham, B. (2015). Sparse 3d convolutional neural net-
works. arXiv preprint arXiv:1505.02890.
Graham, B., Engelcke, M., and Van Der Maaten, L. (2018).
3d semantic segmentation with submanifold sparse
convolutional networks. In CVPR, pages 9224–9232.
Graham, B. and van der Maaten, L. (2017). Submani-
fold sparse convolutional networks. arXiv preprint
arXiv:1706.01307.
He, K., Gkioxari, G., Doll
´
ar, P., and Girshick, R. (2017).
Mask r-cnn. In ICCV, pages 2961–2969.
Huang, T., Liu, Z., Chen, X., and Bai, X. (2020). Epnet:
Enhancing point features with image semantics for 3d
object detection. In ECCV, pages 35–52. Springer.
Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander,
S. L. (2018). Joint 3d proposal generation and object
detection from view aggregation. In IROS, pages 1–8.
IEEE.
Kuhn, H. W. (1955). The hungarian method for the as-
signment problem. Naval research logistics quarterly,
2(1-2):83–97.
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., and
Beijbom, O. (2018). Pointpillars: Fast encoders for
object detection from point clouds. arXiv preprint
arXiv:1812.05784.
Lang, A. H., Vora, S., Caesar, H., Zhou, L., Yang, J., and
Beijbom, O. (2019). Pointpillars: Fast encoders for
object detection from point clouds. In CVPR, pages
12697–12705.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017a). Feature pyramid networks
for object detection. In CVPR, pages 2117–2125.
Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll
´
ar, P.
(2017b). Focal loss for dense object detection. In
ICCV, pages 2980–2988.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.,
Fu, C.-Y., and Berg, A. C. (2016). Ssd: Single shot
multibox detector. In ECCV, pages 21–37. Springer.
Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., and Bai, X.
(2020). Tanet: Robust 3d object detection from point
clouds with triple attention. In AAAI, pages 11677–
11684.
OpenPCDet (2020). Openpcdet: An open-source tool-
box for 3d object detection from point clouds. https:
//github.com/open-mmlab/OpenPCDet.
Qi, C. R., Liu, W., Wu, C., Su, H., and Guibas, L. J. (2018).
Frustum pointnets for 3d object detection from rgb-d
data. In CVPR, pages 918–927.
Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017a). Point-
net: Deep learning on point sets for 3d classification
and segmentation. In CVPR, pages 652–660.
Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Point-
net++: Deep hierarchical feature learning on point sets
in a metric space. In NeurIPS, pages 5099–5108.
Redmon, J. and Farhadi, A. (2018). Yolov3: An incremental
improvement. arXiv preprint arXiv:1804.02767.
Shi, S., Wang, X., and Li, H. (2019). Pointrcnn: 3d object
proposal generation and detection from point cloud.
In CVRR, pages 770–779.
Shi, S., Wang, Z., Shi, J., Wang, X., and Li, H. (2020).
From points to parts: 3d object detection from point
cloud with part-aware and part-aggregation network.
TPAMI.
Stewart, R., Andriluka, M., and Ng, A. Y. (2016). End-
to-end people detection in crowded scenes. In CVPR,
pages 2325–2333.
Sun, P., Zhang, R., Jiang, Y., Kong, T., Xu, C., Zhan,
W., Tomizuka, M., Li, L., Yuan, Z., Wang, C., and
Luo, P. (2020). SparseR-CNN: End-to-end object
detection with learnable proposals. arXiv preprint
arXiv:2011.12450.
Tychsen-Smith, L. and Petersson, L. (2018). Improving
object localization with fitness nms and bounded iou
loss. In CVPR.
Wang, Z. and Jia, K. (2019). Frustum convnet: Sliding frus-
tums to aggregate local point-wise features for amodal
3d object detection. arXiv preprint arXiv:1903.01864.
Yan, Y., Mao, Y., and Li, B. (2018). Second:
Sparsely embedded convolutional detection. Sensors,
18(10):3337.
Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham,
A., and Trigoni, N. (2019a). Learning object bounding
boxes for 3d instance segmentation on point clouds.
arXiv preprint arXiv:1906.01140.
Yang, Z., Sun, Y., Liu, S., Shen, X., and Jia, J. (2019b). Std:
Sparse-to-dense 3d object detector for point cloud. In
ICCV, pages 1951–1960.
Yin, T., Zhou, X., and Kr
¨
ahenb
¨
uhl, P. (2021). Center-based
3d object detection and tracking.
Zhao, X., Liu, Z., Hu, R., and Huang, K. (2019). 3d object
detection using scale invariant and feature reweighting
networks. In AAAI, volume 33, pages 9267–9274.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D.
(2020). Distance-iou loss: Faster and better learning
for bounding box regression. In AAAI, volume 34,
pages 12993–13000.
Zhou, D., Fang, J., Song, X., Guan, C., and Yang, R. (2019).
Iou loss for 2d/3d object detection. In 3DV.
Zhou, Y. and Tuzel, O. (2018). Voxelnet: End-to-end learn-
ing for point cloud based 3d object detection. In
CVPR, pages 4490–4499.
SparseDet: Towards End-to-End 3D Object Detection
791