An Online Vision System for Understanding Complex Assembly Tasks

Thiusius Rajeeth Savarimuthu, Jeremie Papon, Anders Glent Buch, Eren Erdal Aksoy, Wail Mustafa, Florentin Wörgötter, Norbert Krüger

Abstract

We present an integrated system for the recognition, pose estimation and simultaneous tracking of multiple objects in 3D scenes. Our target application is a complete semantic representation of dynamic scenes which requires three essential steps; recognition of objects, tracking their movements, and identification of interactions between them. We address this challenge with a complete system which uses object recognition and pose estimation to initiate object models and trajectories, a dynamic sequential octree structure to allow for full 6DOF tracking through occlusions, and a graph-based semantic representation to distil interactions. We evaluate the proposed method on real scenarios by comparing tracked outputs to ground truth part trajectories and compare the results to Iterative Closest Point and Particle Filter based trackers.

References

  1. Aksoy, E. E., Abramov, A., Dörr, J., Ning, K., Dellen, B., and W örgötter, F. (2011). Learning the semantics of object-action relations by observation. The International Journal of Robotics Research, 30(10):1229- 1249.
  2. Aldoma, A., Tombari, F., Di Stefano, L., and Vincze, M. (2012). A global hypotheses verification method for 3d object recognition. In Computer Vision-ECCV 2012, pages 511-524. Springer.
  3. Besl, P. and McKay, N. D. (1992). A method for registration of 3-d shapes. PAMI, 14(2):239-256.
  4. Buch, A., Kraft, D., Kamarainen, J.-K., Petersen, H., and Krüger, N. (2013). Pose estimation using local structure-specific shape and appearance context. In ICRA, pages 2080-2087.
  5. Choi, C. and Christensen, H. (2013). Rgb-d object tracking: A particle filter approach on gpu. In Intelligent Robots and Systems (IROS), 2013 IEEE/RSJ International Conference on.
  6. Drost, B., Ulrich, M., Navab, N., and Ilic, S. (2010). Model globally, match locally: Efficient and robust 3d object recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 998- 1005. IEEE.
  7. Fox, D. (2003). Adapting the sample size in particle filters through kld-sampling. The International Journal of Robotics Research, 22(12):985-1003.
  8. Kootstra, G., Popovic, M., Jørgensen, J., Kuklinski, K., Miatliuk, K., Kragic, D., and Kruger, N. (2012). Enabling grasping of unknown objects through a synergistic use of edge and surface information. The International Journal of Robotics Research, 31(10):1190- 1213.
  9. Mustafa, W., Pugeault, N., and Krüger, N. (2013). Multiview object recognition using view-point invariant shape relations and appearance information. In ICRA, pages 4230-4237.
  10. Newcombe, R. A., Davison, A. J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux, D., Hodges, S., Kim, D., and Fitzgibbon, A. (2011). Kinectfusion: Real-time dense surface mapping and tracking. In ISMAR, pages 127-136. IEEE.
  11. Papon, J., Kulvicius, T., Aksoy, E., and Worgotter, F. (2013). Point cloud video object segmentation using a persistent supervoxel world-model. In IROS, pages 3712-3718.
  12. Ramirez-Amaro, K., Beetz, M., and Cheng, G. (2014). Automatic Segmentation and Recognition of Human Activities from Observation based on Semantic Reasoning. In IEEE/RSJ International Conference on Intelligent Robots and Systems.
  13. Ren, C., Prisacariu, V., Murray, D., and Reid, I. (2013). Star3d: Simultaneous tracking and reconstruction of 3d objects using rgb-d data. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 1561-1568.
  14. Rusu, R. B. and Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In ICRA, Shanghai, China.
  15. Savarimuthu, T., Liljekrans, D., Ellekilde, L.-P., Ude, A., Nemec, B., and Kruger, N. (2013). Analysis of human peg-in-hole executions in a robotic embodiment using uncertain grasps. In Robot Motion and Control (RoMoCo), 2013 9th Workshop on, pages 233-239.
  16. Tombari, F., Salti, S., and Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In ECCV, pages 356-369.
  17. Yang, Y., Fermüller, C., and Aloimonos, Y. (2013). Detection of manipulation action consequences (mac). In Computer Vision and Pattern Recognition, pages 2563-2570.
  18. Zhang, B., Wang, J., Rossano, G., Martinez, C., and Kock, S. (2011). Vision-guided robot alignment for scalable, flexible assembly automation. In ROBIO, pages 944- 951.
Download


Paper Citation


in Harvard Style

Savarimuthu T., Papon J., Buch A., Aksoy E., Mustafa W., Wörgötter F. and Krüger N. (2015). An Online Vision System for Understanding Complex Assembly Tasks . In Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015) ISBN 978-989-758-091-8, pages 454-461. DOI: 10.5220/0005260804540461


in Bibtex Style

@conference{visapp15,
author={Thiusius Rajeeth Savarimuthu and Jeremie Papon and Anders Glent Buch and Eren Erdal Aksoy and Wail Mustafa and Florentin Wörgötter and Norbert Krüger},
title={An Online Vision System for Understanding Complex Assembly Tasks},
booktitle={Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)},
year={2015},
pages={454-461},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005260804540461},
isbn={978-989-758-091-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 3: VISAPP, (VISIGRAPP 2015)
TI - An Online Vision System for Understanding Complex Assembly Tasks
SN - 978-989-758-091-8
AU - Savarimuthu T.
AU - Papon J.
AU - Buch A.
AU - Aksoy E.
AU - Mustafa W.
AU - Wörgötter F.
AU - Krüger N.
PY - 2015
SP - 454
EP - 461
DO - 10.5220/0005260804540461