noting that ORB-SLAM2 got the highest accuracy
in the case of 111plants dynamic. Although ORB-
SLAM2 cannot filter out dynamic objects, the high-
quality feature points make the system more robust in
dynamic environments.
In order to increase the transparency of our work
and make it reproducible for other researchers, we
will provide the source code of our implementation
after the paper is published at https://github.com/
mjtq-slamlearning/SI-VSLAM.
In future work we plan to deploy and evaluate the
proposed system on a real watermelon field by work-
ing together with a local farmer. We will hopefully
have a watermelon detector ready and integrated in
our system until the next picking season.
REFERENCES
Belhedi, A., Bartoli, A., Bourgeois, S., Hamrouni, K., Sayd,
P., and Gay-Bellile, V. (2012). Noise modelling and
uncertainty propagation for TOF sensors. In Fusiello,
A., Murino, V., and Cucchiara, R., editors, Computer
Vision – ECCV 2012. Workshops and Demonstrations,
pages 476–485, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Bharati, S., Khan, T. Z., Podder, P., and Hung, N. Q. (2021).
A Comparative Analysis of Image Denoising Prob-
lem: Noise Models, Denoising Filters and Applica-
tions, pages 49–66. Springer International Publishing,
Cham.
Campos, C., Elvira, R., Rodr
´
ıguez, J. J. G., M. Mon-
tiel, J. M., and D. Tard
´
os, J. (2021). ORB-
SLAM3: An accurate open-source library for visual,
visual–inertial, and multimap SLAM. IEEE Transac-
tions on Robotics, pages 1–17.
Cartucho, J., Tukra, S., Li, Y., S. Elson, D., and Giannarou,
S. (2020). Visionblender: A tool to efficiently gener-
ate computer vision datasets for robotic surgery. Com-
puter Methods in Biomechanics and Biomedical Engi-
neering: Imaging & Visualization, pages 1–8.
Everingham, M., Eslami, S. A., Van Gool, L., Williams,
C. K., Winn, J., and Zisserman, A. (2015). The pascal
visual object classes challenge: A retrospective. Inter-
national journal of computer vision, 111(1):98–136.
Free3D (2021). Free3D: Bird v1. https://free3d.com/
3d-model/bird-v1--94904.html. Accessed: 2021-11-
27.
Ganchenko, V. and Doudkin, A. (2019). Image semantic
segmentation based on convolutional neural networks
for monitoring agricultural vegetation. In Ablameyko,
S. V., Krasnoproshin, V. V., and Lukashevich, M. M.,
editors, Pattern Recognition and Information Process-
ing, pages 52–63, Cham. Springer International Pub-
lishing.
Heckenkamp, C. (2008). Das magische Auge – Grundla-
gen der Bildverarbeitung: Das PMD Prinzip. Inspect,
pages 25–28.
Kalisz, A., Particke, F., Penk, D., Hiller, M., and Thielecke,
J. (2019). B-SLAM-SIM: A Novel Approach to Eval-
uate the Fusion of Visual SLAM and GPS by Exam-
ple of Direct Sparse Odometry and Blender. In VISI-
GRAPP.
Li, Y. and Vasconcelos, N. (2019). REPAIR: Removing
representation bias by dataset resampling. In 2019
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 9564–9573.
Mur-Artal, R. and Tard
´
os, J. D. (2017). ORB-SLAM2:
An open-source SLAM system for monocular, stereo,
and RGB-D cameras. IEEE Transactions on Robotics,
33(5):1255–1262.
Nguyen, C. V., Izadi, S., and Lovell, D. (2012). Model-
ing kinect sensor noise for improved 3D reconstruc-
tion and tracking. In 2012 Second International Con-
ference on 3D Imaging, Modeling, Processing, Visu-
alization Transmission, pages 524–530.
Prokhorov, D., Zhukov, D., Barinova, O., Anton, K., and
Vorontsova, A. (2019). Measuring robustness of Vi-
sual SLAM. In 2019 16th International Conference
on Machine Vision Applications (MVA), pages 1–6.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., and Cre-
mers, D. (2012). A benchmark for the evaluation of
RGB-D SLAM systems. In 2012 IEEE/RSJ Interna-
tional Conference on Intelligent Robots and Systems,
pages 573–580.
Xiao, L., Heide, F., O’Toole, M., Kolb, A., Hullin, M. B.,
Kutulakos, K., and Heidrich, W. (2015). Defocus de-
blurring and superresolution for time-of-flight depth
cameras. In 2015 IEEE Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 2376–
2384.
Xuan, Z. and David, F. (2018). Real-time voxel based 3D
semantic mapping with a hand held RGB-D camera.
https://github.com/floatlazer/semantic slam.
Yu, C., Liu, Z., Liu, X., Xie, F., Yang, Y., Wei, Q., and Qiao,
F. (2018). DS-SLAM: A Semantic Visual SLAM to-
wards dynamic environments. 2018 IEEE/RSJ Inter-
national Conference on Intelligent Robots and Sys-
tems (IROS), pages 1168–1174.
Zhang, Z. and Scaramuzza, D. (2018). A tutorial on quanti-
tative trajectory evaluation for visual(-inertial) odom-
etry. In 2018 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS), pages 7244–
7251.
Zhuang, J., Wang, Z., and Wang, B. (2021). Video semantic
segmentation with distortion-aware feature correction.
IEEE Transactions on Circuits and Systems for Video
Technology, 31(8):3128–3139.
VISAPP 2022 - 17th International Conference on Computer Vision Theory and Applications
810