segmented. This is due to the fact that there is a dis-
continuity between the right wrist of the person on
the right and the body itself. The discontinuity is
due to the occlusion as can be seen from the Figure
8(a). Hence the grid vertices are not present in that
occluded region leading to over segmentation.
5 CONCLUSIONS
In this paper we have proposed a method of segmen-
tation for RGB-D image frame using grid based con-
nected component analysis. Experiments performed
on HARL dataset indicate improved human segmen-
tation accuracy compared to standard 2D segmenta-
tion approach. The formation of grid reduces the
processing complexity as well as handles the noisy
depth information obtained from Kinect. The pro-
posed method has two limitations namely (i) under
segments the image/video frame if the person is lean-
ing over a wall and (ii) over segments when a human
being is spread over two adjacent grids and one of the
grid has a voxel count less than our defined threshold
value. We have left the adaptive grid size selection
and wall/floor estimation that might solve the existing
limitations as a future scope of research.
ACKNOWLEDGEMENTS
The authors would like to thankfully acknowledge the
help and support from Prof. Dipti Prasad Mukherjee
of ECSU unit of Indian Statistical Institute and Ms
Sangheeta Roy, Mr. Brojeshwar Bhowmick and Mr.
Kingshuk Chakravarty of Innovation Labs, TCS.
REFERENCES
C. Wolf, J. Mille, L.E Lombardi, O. Celiktutan, M. Jiu, M.
Baccouche, E Dellandrea, C.-E. Bichot, C. Garcia, B.
Sankur, (2012). The LIRIS Human activities dataset
and the ICPR 2012 human activities recognition and
localization competition. Technical Report RR-LIRIS-
2012-004, LIRIS Laboratory, March 28th, 2012.
Donoser, M.; Bischof, H. (2006). 3D Segmentation by
Maximally Stable Volumes (MSVs). Pattern Recogni-
tion, 2006. ICPR 2006. 18th International Conference
on , vol.1, no., pp.63-66.
Owens, J. (2012). Object Detection using the Kinect. U.S.
Army Research Laboratory ATTN: RDRL-VTA, Ab-
erdeen Proving Ground MD 21005, March 2012.
Martin Isenburg and Jonathan Shewchuk (2009). Stream-
ing Connected Component Computation for Trillion
Voxel Images. MASSIVE Workshop, June 2009.
K. Wu, E. Otoo and K. Suzuki. (2005). Two strate-
gies to speed up connected component labeling al-
gorithms. Technical report, 2005. Technical Report,
LBNL-59102.
Evangelos Kalogerakis, Aaron Hertzmann, Karan Singh,
(2010). Learning 3D Mesh Segmentation and Label-
ing. ACM Transactions on Graphics, Vol. 29, No. 3,
July 2010.
B. Gorte, N. Pfeifer (2004). 3D Image Processing to Re-
construct Trees from Laser Scans. Proceedings of
the 10th annual conference of the Advanced School
for Computing and Imaging (ASCI), Ouddorp, the
Netherlands, 2004.
Matthieu Molinier, Tuomas Hme and Heikki Ahola (2005).
3D-Connected components analysis for traffic moni-
toring in image sequences acquired from a helicopter.
In Proceedings of the 14th Scandinavian conference
on Image Analysis (SCIA’05), Heikki Kalviainen,
Jussi Parkkinen, and Arto Kaarna (Eds.). Springer-
Verlag, Berlin, Heidelberg, 141-150.
Frederik Hegger, Nico Hochgeschwender, Gerhard K.
Kraetzschmar and Paul G. Ploeger. (2012). People
Detection in 3d Point Clouds using Local Surface Nor-
mals. RoboCup, Mexico, 2012.
F. Tombari, L. Di Stefano, S. Giardino. (2011). Online
Learning for Automatic Segmentation of 3D Data.
IEEE/RSJ Int. Conf. on Intelligent Robots and Systems
(IROS ’11), 2011.
J. Hu, G. Farin, M. H. (2003). Statistical 3D Segmen-
tation With Greedy Connected Component Labelling
Refinement Research OnlinID paper-0017, 2003
L. Xia, C.-C. Chen, and J. K. Aggarwal, (2012). View In-
variant Human Action Recognition Using Histograms
of 3D Joints. The 2nd International Workshop on Hu-
man Activity Understanding from 3D Data (HAU3D),
CVPR 2012.
H. Trinh, Q. Fan, S. Pankanti et al. (2011). Detecting Hu-
man Activities in Retail Surveillance Using Hierarchi-
cal Finite State Machine. International Conference
on Acoustics, Speech and Signal Processing (ICASSP)
2011.
Hoang Trinh, Quanfu Fan, Prasad Gabbur, Sharath Pankanti
(2012). Hand tracking by binary quadratic program-
ming and its application to retail activity recognition.
CVPR 2012: 1902-1909.
Prasad Gabbur, Sharath Pankanti, Quanfu Fan, Hoang Trinh
(2011). A pattern discovery approach to retail fraud
detection. KDD 2011: 307-315.
J. Alon, V. Athitsos, Q. Yuan and S. Sclaroff. (2009). A uni-
fied framework for gesture recognition and spatiotem-
poral gesture segmentation. IEEE PAMI, vol. 31, pp.
16851699, 2009.
The teardown. (2011). Engineering Technology, vol. 6,
no.3, pp. 94-95, April 2011.
I.P. Tharindu Weerasinghe, Janaka Y. Ruwanpura, Jeffrey
E. Boyd, and Ayman F. Habib. (2012). Application
of Microsoft Kinect sensor for tracking construction
workers. Construction Research Congress 2012, May
21-23.
Khoshelham, K., Oude Elberink, S. (2012). Accuracy and
resolution of kinect depth data for indoor mapping ap-
plications. Sensors, vol. 12, 1437-1454.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
332