Segmentation of Kinect Captured Images using Grid based 3D Connected Component Labeling

Aniruddha Sinha, T. Chattopadhyay, Apurbaa Mallik

Abstract

In this paper authors have presented a grid based 3-Dimensional (3D) connected component labeling method to segment the video frames captured using Kinect RGB-D sensor. The Kinect captures the RGB value of the object as well as its depth using two different cameras/sensors. A calibration between these two sensors enables us to generate the point cloud (a 6 tuple entry containing the RGB values as well as its position along x, y and z directions with respect to the camera) for each pixel in the depth image. In the proposed method we initially construct the point clouds for all the pixels in the depth image. Then the space comprising the cloud points is divided into 3D grids and then label the components using the same index which are connected in the 3D space. The proposed method can segment the images even where the projection of two spatially different objects overlaps in the projected plane. We have tested the segmentation method against the HARL dataset with different grid size and obtained an overall segmentation accuracy of 83.8% for the optimum grid size.

References

  1. C. Wolf, J. Mille, L.E Lombardi, O. Celiktutan, M. Jiu, M. Baccouche, E Dellandrea, C.-E. Bichot, C. Garcia, B. Sankur, (2012). The LIRIS Human activities dataset and the ICPR 2012 human activities recognition and localization competition. Technical Report RR-LIRIS2012-004, LIRIS Laboratory, March 28th, 2012.
  2. Donoser, M.; Bischof, H. (2006). 3D Segmentation by Maximally Stable Volumes (MSVs). Pattern Recognition, 2006. ICPR 2006. 18th International Conference on , vol.1, no., pp.63-66.
  3. Owens, J. (2012). Object Detection using the Kinect. U.S. Army Research Laboratory ATTN: RDRL-VTA, Aberdeen Proving Ground MD 21005, March 2012.
  4. Martin Isenburg and Jonathan Shewchuk (2009). Streaming Connected Component Computation for Trillion Voxel Images. MASSIVE Workshop, June 2009.
  5. K. Wu, E. Otoo and K. Suzuki. (2005). Two strategies to speed up connected component labeling algorithms. Technical report, 2005. Technical Report, LBNL-59102.
  6. Evangelos Kalogerakis, Aaron Hertzmann, Karan Singh, (2010). Learning 3D Mesh Segmentation and Labeling. ACM Transactions on Graphics, Vol. 29, No. 3, July 2010.
  7. B. Gorte, N. Pfeifer (2004). 3D Image Processing to Reconstruct Trees from Laser Scans. Proceedings of the 10th annual conference of the Advanced School for Computing and Imaging (ASCI), Ouddorp, the Netherlands, 2004.
  8. Matthieu Molinier, Tuomas Hme and Heikki Ahola (2005). 3D-Connected components analysis for traffic monitoring in image sequences acquired from a helicopter. In Proceedings of the 14th Scandinavian conference on Image Analysis (SCIA'05), Heikki Kalviainen, Jussi Parkkinen, and Arto Kaarna (Eds.). SpringerVerlag, Berlin, Heidelberg, 141-150.
  9. Frederik Hegger, Nico Hochgeschwender, Gerhard K. Kraetzschmar and Paul G. Ploeger. (2012). People Detection in 3d Point Clouds using Local Surface Normals. RoboCup, Mexico, 2012.
  10. F. Tombari, L. Di Stefano, S. Giardino. (2011). Online Learning for Automatic Segmentation of 3D Data. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS 7811), 2011.
  11. L. Xia, C.-C. Chen, and J. K. Aggarwal, (2012). View Invariant Human Action Recognition Using Histograms of 3D Joints. The 2nd International Workshop on Human Activity Understanding from 3D Data (HAU3D), CVPR 2012.
  12. H. Trinh, Q. Fan, S. Pankanti et al. (2011). Detecting Human Activities in Retail Surveillance Using Hierarchical Finite State Machine. International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2011.
  13. Hoang Trinh, Quanfu Fan, Prasad Gabbur, Sharath Pankanti (2012). Hand tracking by binary quadratic programming and its application to retail activity recognition. CVPR 2012: 1902-1909.
  14. Prasad Gabbur, Sharath Pankanti, Quanfu Fan, Hoang Trinh (2011). A pattern discovery approach to retail fraud detection. KDD 2011: 307-315.
  15. J. Alon, V. Athitsos, Q. Yuan and S. Sclaroff. (2009). A unified framework for gesture recognition and spatiotemporal gesture segmentation. IEEE PAMI, vol. 31, pp. 16851699, 2009.
  16. The teardown. (2011). Engineering Technology, vol. 6, no.3, pp. 94-95, April 2011.
  17. I.P. Tharindu Weerasinghe, Janaka Y. Ruwanpura, Jeffrey E. Boyd, and Ayman F. Habib. (2012). Application of Microsoft Kinect sensor for tracking construction workers. Construction Research Congress 2012, May 21-23.
  18. Khoshelham, K., Oude Elberink, S. (2012). Accuracy and resolution of kinect depth data for indoor mapping applications. Sensors, vol. 12, 1437-1454.
Download


Paper Citation


in Harvard Style

Sinha A., Chattopadhyay T. and Mallik A. (2013). Segmentation of Kinect Captured Images using Grid based 3D Connected Component Labeling . In Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013) ISBN 978-989-8565-47-1, pages 327-332. DOI: 10.5220/0004289303270332


in Bibtex Style

@conference{visapp13,
author={Aniruddha Sinha and T. Chattopadhyay and Apurbaa Mallik},
title={Segmentation of Kinect Captured Images using Grid based 3D Connected Component Labeling},
booktitle={Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)},
year={2013},
pages={327-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004289303270332},
isbn={978-989-8565-47-1},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Computer Vision Theory and Applications - Volume 1: VISAPP, (VISIGRAPP 2013)
TI - Segmentation of Kinect Captured Images using Grid based 3D Connected Component Labeling
SN - 978-989-8565-47-1
AU - Sinha A.
AU - Chattopadhyay T.
AU - Mallik A.
PY - 2013
SP - 327
EP - 332
DO - 10.5220/0004289303270332