Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes

Aleksi Ikkala, Joni Pajarinen, Ville Kyrki

2016

Abstract

In this paper we present a new RGB-D dataset captured with the Kinect sensor. The dataset is composed of typical children’s toys and contains a total of 449 RGB-D images alongside with their annotated ground truth images. Compared to existing RBG-D object segmentation datasets, the objects in our proposed dataset have more complex shapes and less texture. The images are also crowded and thus highly occluded. Three state-of-the-art segmentation methods are benchmarked using the dataset. These methods attack the problem of object segmentation from different starting points, providing a comprehensive view on the properties of the proposed dataset as well as the state-of-the-art performance. The results are mostly satisfactory but there remains plenty of room for improvement. This novel dataset thus poses the next challenge in the area of RGB-D object segmentation.

References

  1. Anand, A., Koppula, H. S., Joachims, T., and Saxena, A. (2011). Contextually guided semantic labeling and search for 3D point clouds. International Journal of Robotics Research, abs/1111.5358.
  2. Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121-167.
  3. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):1-27.
  4. Chen, Y.-W. and Lin, C.-J. (2006). Combining SVMs with various feature selection strategies. In Feature Extraction, volume 207 of Studies in Fuzziness and Soft Computing, chapter 13, pages 315-324. Springer Berlin Heidelberg.
  5. Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G. R., Konolige, K., and Navab, N. (2012). Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In Proceedings of the Asian Conference on Computer Vision (ACCV), pages 548-562. Springer.
  6. Koppula, H. S., Anand, A., Joachims, T., and Saxena, A. (2011). Semantic labeling of 3D point clouds for indoor scenes. In Advances in Neural Information Processing Systems, pages 244-252. Curran Associates, Inc.
  7. Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1817-1824. IEEE.
  8. Mian, A., Bennamoun, M., and Owens, R. (2006). Threedimensional model-based object recognition and segmentation in cluttered scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1584-1601.
  9. Mian, A., Bennamoun, M., and Owens, R. (2010). On the repeatability and quality of keypoints for local featurebased 3d object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2-3):348- 361.
  10. Richtsfeld, A., Mörwald, T., Prankl, J., Zillich, M., and Vincze, M. (2012). Segmentation of unknown objects in indoor environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 4791-4796. IEEE.
  11. Richtsfeld, A., Mörwald, T., Prankl, J., Zillich, M., and Vincze, M. (2014). Learning of perceptual grouping for object segmentation on RGB-D data. Journal of Visual Communication and Image Representation, 25(1):64 - 73.
  12. Rusu, R. B. and Cousins, S. (2011). 3D is here: Point cloud library (PCL). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 1-4. IEEE.
  13. Silberman, N. and Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, pages 601-608. IEEE.
  14. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In Proceedings of the 12th European Conference on Computer Vision (ECCV), pages 746-760. Springer-Verlag.
  15. Singh, A., Sha, J., Narayan, K. S., Achim, T., and Abbeel, P. (2014). BigBIRD: A large-scale 3D database of object instances. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 509-516. IEEE.
  16. Stein, S., Schoeler, M., Papon, J., and Worgotter, F. (2014). Object partitioning using local convexity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 304-311. IEEE.
  17. Uckermann, A., Haschke, R., and Ritter, H. (2013). Realtime 3D segmentation for human-robot interaction. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2136-2143. IEEE.
Download


Paper Citation


in Harvard Style

Ikkala A., Pajarinen J. and Kyrki V. (2016). Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes . In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016) ISBN 978-989-758-175-5, pages 107-116. DOI: 10.5220/0005675501070116


in Bibtex Style

@conference{visapp16,
author={Aleksi Ikkala and Joni Pajarinen and Ville Kyrki},
title={Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes},
booktitle={Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)},
year={2016},
pages={107-116},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005675501070116},
isbn={978-989-758-175-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP, (VISIGRAPP 2016)
TI - Benchmarking RGB-D Segmentation: Toy Dataset of Complex Crowded Scenes
SN - 978-989-758-175-5
AU - Ikkala A.
AU - Pajarinen J.
AU - Kyrki V.
PY - 2016
SP - 107
EP - 116
DO - 10.5220/0005675501070116