functionalities reduce the cognitive workload of the
user and make the annotation process more intuitive.
For future work, several additions can be con-
sidered to further enhance the efficiency of our pro-
posed pipeline. For example, distinguishing objects
of interest from static background in 3D point clouds
is a challenging task. By adding ground-plane and
background removal algorithms as part of the pre-
processing pipeline for depth maps, distracting ele-
ments can be removed from the scene, allowing the
user to identify and annotate objects with greater ease.
Furthermore, depth points inside 3D bounding boxes
can be projected to an already segmented RGB im-
age to automatically infer depth segmentation masks,
leaving the user to only remove or add individual
points, if necessary.
ACKNOWLEDGMENTS
We would like to thank every participant in the user
study for their time and efforts. This work was par-
tially funded by the German Federal Ministry of Edu-
cation and Research in the context of the project EN-
NOS (13N14975).
REFERENCES
Arief, H. A., Arief, M., Zhang, G., Liu, Z., Bhat, M., Indahl,
U. G., Tveite, H., and Zhao, D. (2020). Sane: Smart
annotation and evaluation tools for point cloud data.
IEEE Access, 8:131848–131858.
Caesar, H., Bankiti, V., Lang, A. H., Vora, S., Liong, V. E.,
Xu, Q., Krishnan, A., Pan, Y., Baldan, G., and Bei-
jbom, O. (2020). nuscenes: A multimodal dataset for
autonomous driving. In 2020 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 11618–11628.
Chang, M., Lambert, J., Sangkloy, P., Singh, J., Bak, S.,
Hartnett, A., Wang, D., Carr, P., Lucey, S., Ramanan,
D., and Hays, J. (2019). Argoverse: 3d tracking and
forecasting with rich maps. In 2019 IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 8740–8749.
Dewan, A., Caselitz, T., Tipaldi, G. D., and Burgard, W.
(2016). Motion-based detection and tracking in 3d li-
dar scans. In 2016 IEEE International Conference on
Robotics and Automation (ICRA), pages 4508–4513.
Geiger, A., Lenz, P., Stiller, C., and Urtasun, R. (2013).
Vision meets robotics: The kitti dataset. International
Journal of Robotics Research, 32(11):1231–1237.
Grenzd
¨
orffer, T., G
¨
unther, M., and Hertzberg, J. (2020).
Ycb-m: A multi-camera rgb-d dataset for object
recognition and 6dof pose estimation. In 2020 IEEE
International Conference on Robotics and Automation
(ICRA), pages 3650–3656.
Hoda
ˇ
n, T., Haluza, P., Obdr
ˇ
z
´
alek,
ˇ
S., Matas, J., Lourakis,
M., and Zabulis, X. (2017). T-less: An rgb-d dataset
for 6d pose estimation of texture-less objects. In 2017
IEEE Winter Conference on Applications of Computer
Vision (WACV), pages 880–888.
Huang, X., Wang, P., Cheng, X., Zhou, D., Geng, Q., and
Yang, R. (2020). The apolloscape open dataset for
autonomous driving and its application. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
42(10):2702–2719.
Lee, J., Walsh, S., Harakeh, A., and Waslander, S. L. (2018).
Leveraging pre-trained 3d object detection models for
fast ground truth generation. In 2018 21st Interna-
tional Conference on Intelligent Transportation Sys-
tems (ITSC), pages 2504–2510.
Marion, P., Florence, P. R., Manuelli, L., and Tedrake,
R. (2018). Label fusion: A pipeline for generat-
ing ground truth labels for real rgbd data of cluttered
scenes. In 2018 IEEE International Conference on
Robotics and Automation (ICRA), pages 3235–3242.
Monica, R., Aleotti, J., Zillich, M., and Vincze, M. (2017).
Multi-label point cloud annotation by selection of
sparse control points. In 2017 International Confer-
ence on 3D Vision (3DV), pages 301–308.
Patil, A., Malla, S., Gang, H., and Chen, Y. (2019). The
h3d dataset for full-surround 3d multi-object detec-
tion and tracking in crowded urban scenes. In 2019
International Conference on Robotics and Automation
(ICRA), pages 9552–9557.
Plachetka, C., Rieken, J., and Maurer, M. (2018). The tubs
road user dataset: A new lidar dataset and its appli-
cation to cnn-based road user classification for auto-
mated vehicles. In 2018 21st International Conference
on Intelligent Transportation Systems (ITSC), pages
2623–2630.
Rother, C., Kolmogorov, V., and Blake, A. (2004). ”grab-
cut”: Interactive foreground extraction using iterated
graph cuts. In ACM SIGGRAPH 2004 Papers, SIG-
GRAPH ’04, pages 309–314.
Shoemake, K. (1985). Animating rotation with quaternion
curves. In Proceedings of the 12th Annual Conference
on Computer Graphics and Interactive Techniques,
SIGGRAPH ’85, pages 245–254.
Song, S., Lichtenberg, S. P., and Xiao, J. (2015). Sun rgb-
d: A rgb-d scene understanding benchmark suite. In
2015 IEEE Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 567–576.
Suchi, M., Patten, T., Fischinger, D., and Vincze, M. (2019).
Easylabel: A semi-automatic pixel-wise object anno-
tation tool for creating robotic rgb-d datasets. In 2019
International Conference on Robotics and Automation
(ICRA), pages 6678–6684.
Wang, B., Wu, V., Wu, B., and Keutzer, K. (2019). Latte:
Accelerating lidar point cloud annotation via sensor
fusion, one-click annotation, and tracking. In 2019
IEEE Intelligent Transportation Systems Conference
(ITSC), pages 265–272.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
602