computer vision tasks such as classification and seg-
mentation.
REFERENCES
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov,
A., and Zagoruyko, S. (2020). End-to-end object de-
tection with transformers.
Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2016). Multi-
view 3d object detection network for autonomous
driving.
Chen, Y., Liu, S., Shen, X., and Jia, J. (2019). Fast point
r-cnn.
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser,
T., and Nießner, M. (2017). Scannet: Richly-
annotated 3d reconstructions of indoor scenes. In
Proc. Computer Vision and Pattern Recognition
(CVPR), IEEE.
Giancola, S., Zarzar, J., and Ghanem, B. (2019). Leveraging
shape completion for 3d siamese tracking.
Graham, B., Engelcke, M., and van der Maaten, L. (2017).
3d semantic segmentation with submanifold sparse
convolutional networks.
Guo, M.-H., Cai, J.-X., Liu, Z.-N., Mu, T.-J., Martin, R. R.,
and Hu, S.-M. (2021). PCT: Point cloud transformer.
Computational Visual Media, 7(2):187–199.
Ku, J., Mozifian, M., Lee, J., Harakeh, A., and Waslander,
S. (2017). Joint 3d proposal generation and object de-
tection from view aggregation.
Landrieu, L. and Simonovsky, M. (2017). Large-scale point
cloud semantic segmentation with superpoint graphs.
Liu, Z., Zhang, Z., Cao, Y., Hu, H., and Tong, X. (2021).
Group-free 3d object detection via transformers.
Maturana, D. and Scherer, S. (2015). Voxnet: A 3d convolu-
tional neural network for real-time object recognition.
In 2015 IEEE/RSJ International Conference on Intel-
ligent Robots and Systems (IROS), pages 922–928.
Misra, I., Girdhar, R., and Joulin, A. (2021). An end-to-end
transformer model for 3d object detection.
Pan, X., Xia, Z., Song, S., Li, L. E., and Huang, G. (2020).
3d object detection with pointformer.
Qi, C. R., Litany, O., He, K., and Guibas, L. J. (2019). Deep
hough voting for 3d object detection in point clouds.
Qi, C. R., Liu, W., Wu, C., Su, H., and Guibas, L. J. (2017a).
Frustum pointnets for 3d object detection from rgb-d
data.
Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2016). Pointnet:
Deep learning on point sets for 3d classification and
segmentation.
Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Point-
net++: Deep hierarchical feature learning on point sets
in a metric space.
Qi, H., Feng, C., Cao, Z., Zhao, F., and Xiao, Y. (2020).
P2b: Point-to-box network for 3d object tracking in
point clouds.
Rukhovich, D., Vorontsova, A., and Konushin, A. (2021).
Fcaf3d: Fully convolutional anchor-free 3d object de-
tection.
Shi, S., Wang, X., and Li, H. (2018). Pointrcnn: 3d object
proposal generation and detection from point cloud.
Song, S., Lichtenberg, S. P., and Xiao, J. (2015). Sun rgb-
d: A rgb-d scene understanding benchmark suite. In
2015 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 567–576.
Su, H., Maji, S., Kalogerakis, E., and Learned-Miller, E.
(2015). Multi-view convolutional neural networks for
3d shape recognition.
Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B.,
Goulette, F., and Guibas, L. J. (2019). Kpconv: Flexi-
ble and deformable convolution for point clouds.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, L., and Polosukhin, I.
(2017). Attention is all you need.
Wang, W., Zhang, J., Cao, Y., Shen, Y., and Tao, D. (2022).
Towards data-efficient detection transformers.
Xie, Q., Lai, Y.-K., Wu, J., Wang, Z., Zhang, Y., Xu, K.,
and Wang, J. (2020a). Mlcvnet: Multi-level context
votenet for 3d object detection.
Xie, S., Gu, J., Guo, D., Qi, C. R., Guibas, L. J., and Litany,
O. (2020b). Pointcontrast: Unsupervised pre-training
for 3d point cloud understanding.
Yamada, R., Kataoka, H., Chiba, N., Domae, Y., and Ogata,
T. (2022). Point cloud pre-training with natural 3d
structures. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 21283–21293.
Yang, B., Luo, W., and Urtasun, R. (2019). Pixor: Real-time
3d object detection from point clouds.
Zhao, H., Jiang, L., Jia, J., Torr, P., and Koltun, V. (2020).
Point transformer.
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020).
Deformable detr: Deformable transformers for end-
to-end object detection.
Data-Efficient Transformer-Based 3D Object Detection
623