3D Region Proposals For Selective Object Search

Sheetal Reddy, Vineet Gandhi, Madhava Krishna

Abstract

The advent of indoor personal mobile robots has clearly demonstrated their utility in assisting humans at various places such as workshops, offices, homes, etc. One of the most important cases in such autonomous scenarios is where the robot has to search for certain objects in large rooms. Exploring the whole room would prove to be extremely expensive in terms of both computing power and time. To address this issue, we demonstrate a fast algorithm to reduce the search space by identifying possible object locations as two classes, namely - Support Structures and Clutter. Support Structures are plausible object containers in a scene such as tables, chairs, sofas, etc. Clutter refers to places where there seem to be several objects but cannot be clearly distinguished. It can also be identified as unorganized regions which can be of interest for tasks such as robot grasping, fetching and placing objects. The primary contribution of this paper is to quickly identify potential object locations using a Support Vector Machine(SVM) learnt over the features extracted from the depth map and the RGB image of the scene, which further culminates into a densely connected Conditional Random Field(CRF) formulated over the image of the scene. The inference over the CRF leads to assignment of the labels - support structure, clutter, others to each pixel.There have been reliable outcomes even during challenging scenarios such as the support structures being far from the robot. The experiments demonstrate the efficacy and speed of the algorithm irrespective of alterations to camera angles, modifications to appearance change, lighting and distance from locations etc.

References

  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. (2012). Slic superpixels compared to state-of-the-art superpixel methods. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(11):2274-2282.
  2. Chang, C. and Lin, C. (2001). LIBSVM: a library for support vector machines.
  3. Gould, S., Fulton, R., and Koller, D. (2009). Decomposing a scene into geometric and semantically consistent regions. In Computer Vision, 2009 IEEE 12th International Conference on, pages 1-8. IEEE.
  4. Gupta, S., Arbeláez, P., Girshick, R., and Malik, J. (2015). Indoor scene understanding with rgb-d images: Bottom-up segmentation, object detection and semantic segmentation. International Journal of Computer Vision, 112(2):133-149.
  5. Hermans, A., Floros, G., and Leibe, B. (2014). Dense 3d semantic mapping of indoor scenes from rgb-d images. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 2631-2638. IEEE.
  6. Kim, B.-s., Kohli, P., and Savarese, S. (2013). 3d scene understanding by voxel-crf. In Proceedings of the IEEE International Conference on Computer Vision, pages 1425-1432.
  7. Koppula, H. S., Anand, A., Joachims, T., and Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. In Shawe-Taylor, J., Zemel, R. S., Bartlett, P. L., Pereira, F., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 24, pages 244-252. Curran Associates, Inc.
  8. Krähenbühl, P. and Koltun, V. (2012). Efficient inference in fully connected crfs with gaussian edge potentials. arXiv preprint arXiv:1210.5644.
  9. Ladicky, L., Russell, C., Kohli, P., and Torr, P. H. (2009). Associative hierarchical crfs for object class image segmentation. In Computer Vision, 2009 IEEE 12th International Conference on, pages 739-746. IEEE.
  10. Ladicky, L., Russell, C., Kohli, P., and Torr, P. H. (2010). Graph cut based inference with co-occurrence statistics. In Computer Vision-ECCV 2010, pages 239-253. Springer.
  11. Quattoni, A. and Torralba, A. (2009). Recognizing indoor scenes. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 413- 420. IEEE.
  12. Ren, X., Bo, L., and Fox, D. (2012). Rgb-(d) scene labeling: Features and algorithms. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2759-2766. IEEE.
  13. Reza, M. A. and Kosecka, J. (2014). Object recognition and segmentation in indoor scenes from rgb-d images. In Robotics Science and Systems (RSS) conference5th workshop on RGB-D: Advanced Reasoning with Depth Cameras.
  14. Rusu, R. B. and Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China.
  15. Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2009). Textonboost for image understanding: Multi-class object recognition and segmentation by jointly modeling texture, layout, and context. International Journal of Computer Vision, 81(1):2-23.
  16. Silberman, N. and Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 601-608. IEEE.
  17. Silberman, N., Hoiem, D., Kohli, P., and Fergus, R. (2012). Indoor segmentation and support inference from rgbd images. In Proceedings of the 12th European Conference on Computer Vision - Volume Part V, ECCV'12, pages 746-760, Berlin, Heidelberg. Springer-Verlag.
  18. Toyoda, T. and Hasegawa, O. (2008). Random field model for integration of local information and global information. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(8):1483-1489.
  19. Wolf, D., Prankl, J., and Vincze, M. (2015). Fast semantic segmentation of 3d point clouds using a dense crf with learned parameters. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 4867-4873. IEEE.
  20. Wu, T.-F., Lin, C.-J., and Weng, R. C. (2004). Probability estimates for multi-class classification by pairwise coupling. The Journal of Machine Learning Research, 5:975-1005.
  21. Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., and Torr, P. H. (2015). Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1529-1537.
Download


Paper Citation


in Harvard Style

Reddy S., Gandhi V. and Krishna M. (2017). 3D Region Proposals For Selective Object Search . In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017) ISBN 978-989-758-226-4, pages 353-361. DOI: 10.5220/0006172903530361


in Bibtex Style

@conference{visapp17,
author={Sheetal Reddy and Vineet Gandhi and Madhava Krishna},
title={3D Region Proposals For Selective Object Search},
booktitle={Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)},
year={2017},
pages={353-361},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006172903530361},
isbn={978-989-758-226-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP, (VISIGRAPP 2017)
TI - 3D Region Proposals For Selective Object Search
SN - 978-989-758-226-4
AU - Reddy S.
AU - Gandhi V.
AU - Krishna M.
PY - 2017
SP - 353
EP - 361
DO - 10.5220/0006172903530361