3D sensing technologies for that purpose and com-
pared 3D and 2D variants of state-of-the-art neural
networks trained on the data collected from a straw-
berry growing farm. These results show encouraging
performance but also allow us to highlight the limita-
tions of current technologies and algorithms. Time-
of-Flight technology, despite its superior quality of
point clouds and shape information, struggles with re-
flective surfaces resulting in a large number of false
detections, while stereo technology, lacking detail in
acquired depth, fails to detect numerous fruits. Tradi-
tional 2D image-based convolutional neural networks
still outperform the 3D networks for the task of fruit
segmentation and therefore are more suited for this
task. This work can be treated as a baseline for fu-
ture work on 3D information for outdoor applications
such as robotic fruit picking and should encourage re-
searchers to pursue more experimentation in such dif-
ficult to counteract limitations found in the paper and
bridge the gap with state-of-the-art techniques in per-
ception for 2D information.
Armeni, I., Sax, A., Zamir, A. R., and Savarese, S. (2017).
Joint 2D-3D-Semantic Data for Indoor Scene Under-
standing. ArXiv e-prints.
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2015).
SegNet: A Deep Convolutional Encoder-Decoder Ar-
chitecture for Image Segmentation. arXiv e-prints.
Bargoti, S. and Underwood, J. (2016). Deep Fruit Detection
in Orchards. arXiv e-prints.
Barnea, E., Mairon, R., and Ben-Shahar, O. (2016). Colour-
agnostic shape-based 3d fruit detection for crop har-
vesting robots. Biosystems Engineering, 146:57 – 70.
Special Issue: Advances in Robotic Agriculture for
Cohen, J. (1960). A coefficient of agreement for nominal
scales. Educational and Psychological Measurement,
Dai, A., Chang, A. X., Savva, M., Halber, M., Funkhouser,
T., and Nießner, M. (2017). ScanNet: Richly-
annotated 3D Reconstructions of Indoor Scenes.
In Proc. Computer Vision and Pattern Recognition
Everingham, M., Gool, L., Williams, C. K., Winn, J.,
and Zisserman, A. (2010). The Pascal Visual Ob-
ject Classes (VOC) Challenge. Int. J. Comput. Vision,
Geiger, A., Lenz, P., and Urtasun, R. (2012). Are we ready
for Autonomous Driving? The KITTI Vision Bench-
mark Suite. In Conference on Computer Vision and
Pattern Recognition (CVPR).
Georg Halmetschlager-Funek, Markus Suchi, M. K. and
Vincze, M. (2018). An empirical evaluation of ten
depth cameras. IEEE Robotics and automation maga-
Grimstad, L. and From, P. J. (2017). The Thorvald II Agri-
cultural Robotic System. Robotics, 6(4).
Hackel, T., Savinov, N., Ladicky, L., Wegner, J. D.,
Schindler, K., and Pollefeys, M. (2017). SEMAN-
TIC3D.NET: A new large-scale point cloud classifica-
tion benchmark. In ISPRS Annals of the Photogram-
metry, Remote Sensing and Spatial Information Sci-
ences, volume IV-1-W1, pages 91–98.
Jiang, M., Wu, Y., Zhao, T., Zhao, Z., and Lu, C. (2018).
PointSIFT: A SIFT-like Network Module for 3D Point
Cloud Semantic Segmentation. arXiv e-prints.
Kazmi, W., Foix, S., Aleny, G., and Andersen, H. J. (2014).
Indoor and outdoor depth imaging of leaves with time-
of-flight and stereo vision sensors: Analysis and com-
parison. ISPRS Journal of Photogrammetry and Re-
mote Sensing, 88:128–146.
Lehnert, C., English, A., McCool, C., Tow, A., and Perez,
T. (2017). Autonomous Sweet Pepper Harvesting for
Protected Cropping Systems. arXiv e-prints.
Lehnert, C., McCool, C., Sa, I., and Perez, T. (2018). A
Sweet Pepper Harvesting Robot for Protected Crop-
ping Environments. arXiv e-prints.
Li, Y., Bu, R., Sun, M., and Chen, B. (2018). PointCNN:
Convolution On X-Transformed Points. arXiv
preprint arXiv:1801.07791.
Noraky, J. and Sze, V. (2018). Low Power Depth Estimation
of Rigid Objects for Time-of-Flight Imaging. arXiv e-
Pham, Q.-H., Thanh Nguyen, D., Hua, B.-S., Roig, G.,
and Yeung, S.-K. (2019). JSIS3D: Joint Semantic-
Instance Segmentation of 3D Point Clouds with Multi-
Task Pointwise Networks and Multi-Value Condi-
tional Random Fields. arXiv e-prints.
Qi, C. R., Su, H., Mo, K., and Guibas, L. J. (2017a). Point-
Net: Deep learning on point sets for 3d classification
and segmentation. Proc. Computer Vision and Pattern
Recognition (CVPR), IEEE, 1(2):4.
Qi, C. R., Yi, L., Su, H., and Guibas, L. J. (2017b). Point-
Net++: Deep hierarchical feature learning on point
sets in a metric space. In Advances in Neural Infor-
mation Processing Systems, pages 5099–5108.
Wang, W., Yu, R., Huang, Q., and Neumann, U. (2017).
SGPN: Similarity Group Proposal Network for 3D
Point Cloud Instance Segmentation. arXiv e-prints.
Wang, X., Liu, S., Shen, X., Shen, C., and Jia, J. (2019).
Associatively Segmenting Instances and Semantics in
Point Clouds. arXiv e-prints.
Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham,
A., and Trigoni, N. (2019). Learning Object Bounding
Boxes for 3D Instance Segmentation on Point Clouds.
arXiv e-prints.
Yoshida, T., Fukao, T., , and Hasegawa, T. (2018). Fast De-
tection of Tomato Peduncle Using Point Cloud with
a Harvesting Robot. Journal of Robotics and Mecha-
tronics, 30(2):180–186.
Evaluation of 3D Vision Systems for Detection of Small Objects in Agricultural Environments