Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications

Markus Schoeler; Simon Christoph Stein; Jeremie Papon; Alexey Abramov; Florentin Woergoetter

doi:10.5220/0004688000940103

Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications

Markus Schoeler, Simon Christoph Stein, Jeremie Papon, Alexey Abramov, Florentin Woergoetter

2014

Abstract

Today most recognition pipelines are trained at an off-line stage, providing systems with pre-segmented images and predefined objects, or at an on-line stage, which requires a human supervisor to tediously control the learning. Self-Supervised on-line training of recognition pipelines without human intervention is a highly desirable goal, as it allows systems to learn unknown, environment specific objects on-the-fly. We propose a fast and automatic system, which can extract and learn unknown objects with minimal human intervention by employing a two-level pipeline combining the advantages of RGB-D sensors for object extraction and high-resolution cameras for object recognition. Furthermore, we significantly improve recognition results with local features by implementing a novel keypoint orientation scheme, which leads to highly invariant but discriminative object signatures. Using only one image per object for training, our system is able to achieve a recognition rate of 79% for 18 objects, benchmarked on 42 scenes with random poses, scales and occlusion, while only taking 7 seconds for the training. Additionally, we evaluate our orientation scheme on the state-of-the-art 56-object SDU-dataset boosting accuracy for one training view per object by +37% to 78% and peaking at a performance of 98% for 11 training views.

References

Barla, A., Odone, F., and Verri, A. (2003). Histogram intersection kernel for image classification. In Image Processing(ICIP), 2003 International Conference on, volume 3, pages III-513-16 vol.2.
Bay, H., Ess, A., Tuytelaars, T., and Van Gool, L. (2008). Speeded-up robust features (surf). Computer Vision Image Understanding, 110(3):346-359.
Binder, A., Wojcikiewicz, W., Müller, C., and Kawanabe, M. (2011). A hybrid supervised-unsupervised vocabulary generation algorithm for visual concept recognition. In Proceedings of the 10th Asian conference on Computer vision - Volume Part III, ACCV'10, pages 95-108, Berlin, Heidelberg. Springer-Verlag.
Bo, L., Ren, X., and Fox, D. (2011). Depth kernel descriptors for object recognition. In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages 821-826.
Bosch, A., Zisserman, A., and Munoz, X. (2007a). Representing shape with a spatial pyramid kernel. In Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR 7807, pages 401-408, New York, NY, USA. ACM.
Bosch, A., Zisserman, A., and Muoz, X. (2007b). Image classification using random forests and ferns. In Computer Vision (ICCV), 2007 IEEE 11th International Conference on, pages 1-8.
Calonder, M., Lepetit, V., Strecha, C., and Fua, P. (2010). Brief: binary robust independent elementary features. In Proceedings of the 11th European conference on Computer vision: Part IV, ECCV'10, pages 778-792, Berlin, Heidelberg. Springer-Verlag.
Csurka, G., Dance, C. R., Fan, L., Willamowski, J., and Bray, C. (2004). Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1-22.
Ekvall, S., Jensfelt, P., and Kragic, D. (2006). Integrating active mobile robot object recognition and slam in natural environments. In Intelligent Robots and Systems (IROS), 2006 IEEE/RSJ International Conference on, pages 5792-5797.
Gall, J., Fossati, A., and Van Gool, L. (2011). Functional categorization of objects using real-time markerless motion capture. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1969-1976.
Gehler, P. and Nowozin, S. (2009). On feature combination for multiclass object classification. In Computer Vision (ICCV), 2009 IEEE 12th International Conference on, pages 221-228.
Hu, X., Zhang, X., Lu, C., Park, E. K., and Zhou, X. (2009). Exploiting wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD 7809, pages 389-396, New York, NY, USA. ACM.
Iravani, P., Hall, P., Beale, D., Charron, C., and Hicks, Y. (2011). Visual object classification by robots, using on-line, self-supervised learning. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 1092-1099.
Kasper, A., Xue, Z., and Dillmann, R. (2012). The KIT object models database: An object model database for object recognition, localization and manipulation in service robotics. The International Journal of Robotics Research (IHRR), 31(8):927-934.
Lai, K., Bo, L., Ren, X., and Fox, D. (2011). A largescale hierarchical multi-view rgb-d object dataset. In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 1817-1824.
Lai, K., Bo, L., Ren, X., and Fox, D. (2012). Detectionbased object labeling in 3d scenes. In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pages 1330-1337.
Lowe, D. G. (2004). Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vision, 60(2):91- 110.
Mustafa, W., Pugeault, N., and Krger, N. (2013). Multiview object recognition using view-point invariant shape relations and appearance information. In Robotics and Automation (ICRA), 2013 IEEE International Conference on.
Rodriguez, B., Peterson, G., and Agaian, S. (2007). Multiclass classification averaging fusion for detecting steganography. In System of Systems Engineering, 2007 IEEE International Conference on, pages 1-5.
Rusu, R. and Cousins, S. (2011). 3d is here: Point cloud library (pcl). In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages 1-4.
Schiebener, D., Ude, A., Morimotot, J., Asfour, T., and Dillmann, R. (2011). Segmentation and learning of unknown objects through physical interaction. In Humanoid Robots (Humanoids), 2011 11th IEEE-RAS International Conference on, pages 500-506.
Silberman, N. and Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pages 601-608.
Szeliski, R. (2010). Computer vision: Algorithms and applications. In Computer Vision: Algorithms and Applications, page 657. Springer.
Van de Sande, K. E. A., Gevers, T., and Snoek, C. G. M. (2010). Evaluating color descriptors for object and scene recognition. 32(9):1582-1596.
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley, 1 edition.
Vijayanarasimhan, S. and Grauman, K. (2011). Efficient region search for object detection. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1401-1408.
Welke, K., Issac, J., Schiebener, D., Asfour, T., and Dillmann, R. (2010). Autonomous acquisition of visual multi-view object representations for object recognition on a humanoid robot. In Robotics and Automation (ICRA), 2010 IEEE International Conference on, pages 2012-2019.
Zhou, F., Torre, F., and Hodgins, J. (2008). Aligned cluster analysis for temporal segmentation of human motion. In Automatic Face Gesture Recognition, 2008 IEEE International Conference on, pages 1-7.
Zhou, X., Yu, K., Zhang, T., and Huang, T. S. (2010). Image classification using super-vector coding of local image descriptors. In Proceedings of the 11th European conference on Computer vision: Part V, ECCV'10, pages 141-154, Berlin, Heidelberg. Springer-Verlag.

Download

Paper Citation

in Harvard Style

Schoeler M., Stein S., Papon J., Abramov A. and Woergoetter F. (2014). Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications . In Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014) ISBN 978-989-758-004-8, pages 94-103. DOI: 10.5220/0004688000940103

in Bibtex Style

@conference{visapp14,
author={Markus Schoeler and Simon Christoph Stein and Jeremie Papon and Alexey Abramov and Florentin Woergoetter},
title={Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications},
booktitle={Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)},
year={2014},
pages={94-103},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004688000940103},
isbn={978-989-758-004-8},
}

in EndNote Style

TY - CONF
JO - Proceedings of the 9th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2014)
TI - Fast Self-supervised On-line Training for Object Recognition Specifically for Robotic Applications
SN - 978-989-758-004-8
AU - Schoeler M.
AU - Stein S.
AU - Papon J.
AU - Abramov A.
AU - Woergoetter F.
PY - 2014
SP - 94
EP - 103
DO - 10.5220/0004688000940103