Real-time Curve-skeleton Extraction of Human-scanned Point Clouds

Application in Upright Human Pose Estimation

Frederic Garcia and Bj

orn Ottersten

Interdisciplinary Centre for Security Reliability and Trust (SnT), University of Luxembourg, Luxembourg City, Luxembourg

Keywords:

Curve-skeleton, Skeletonization, Human Pose Estimation, Object Representation, Point Cloud, Real-time.

Abstract:

This paper presents a practical and robust approach for upright human curve-skeleton extraction. Curve-

skeletons are object descriptors that represent a simpliﬁed version of the geometry and topology of a 3-D

object. The curve-skeleton of a human-scanned point set enables the approximation of the underlying skeletal

structure and thus, to estimate the body conﬁguration (human pose). In contrast to most curve-skeleton extrac-

tion methodologies from the literature, we herein propose a real-time curve-skeleton extraction approach that

applies to scanned point clouds, independently of the object’s complexity and/or the amount of noise within

the depth measurements. The experimental results show the ability of the algorithm to extract a centered

curve-skeleton within the 3-D object, with the same topology, and with unit thickness. The proposed approach

is intended for real world applications and hence, it handles large portions of data missing due to occlusions,

acquisition hindrances or registration inaccuracies.

1 INTRODUCTION

Human pose estimation not only is one of the funda-

mental research topics in computer vision, but a nec-

essary step in active research topics such as scene un-

derstanding, human-computer interaction and action

or gesture recognition; a side-effect of the recent ad-

vances in 3-D sensing technologies.

In this paper, we address the problem of human

pose estimation in the context of 3-D scenes scanned

by multiple consumer-accessible depth cameras such

as the Kinect or the Xtion Pro Live. To that end, we

propose to use curve-skeletons, a compressed repre-

sentation of the 3-D object. Curve-skeletons are ex-

tremely useful in computer graphics and increasingly

being used in computer vision for their valuable aid

to address many visualization tasks including shape

analysis, animation, morphing and/or shape retrieval.

Indeed, curve-skeletons represent a simpliﬁed version

of the 3-D object with a major advantage of preserv-

ing both its geometry and topology information. This,

in turn, allows to estimate the object conﬁguration or

object pose after approximating its skeletal structure.

The remainder of the paper is organized as fol-

lows: Section 2 covers the literature review on the

most recent curve-skeleton extraction approaches for

point cloud datasets. Section 3 introduces our real-

time curve-skeleton extraction approach. In Section 4

we evaluate and analyze the resulting curve-skeletons

from multiple scanned 3-D models. To do so, both

real and synthetic data have been considered. Finally,

concluding remarks are given in Section 5.

2 RELATED WORK

Curve-skeleton makes decisive contribution for hu-

man pose estimation as it enables to estimate the

body conﬁguration by ﬁtting the underlying skele-

tal structure. Consequently, an extensive research

can be found in the literature, with many approaches

strongly dependent on the requirements of their ap-

plications. In the following we only review the most

representative methods for curve-skeleton extraction

from point cloud datasets. For a complete review,

we refer the reader to the comprehensive survey of

Cornea et al. (Cornea et al., 2007).

Skeletonization algorithms can be divided in four

different classes: thinning and boundary propagation,

distance ﬁeld-based, geometric, and general/ﬁeld

functions (Cornea et al., 2007). However, recent ap-

proaches are providing excellent results by combining

different techniques from different classes (Cao et al.,

2010) (Sam et al., 2012) (Tagliasacchi et al., 2009).

Au et al. (Au et al., 2008) proposed a curve-skeleton

extraction approach by mesh contraction using Lapla-

Garcia F. and Ottersten B..

Real-time Curve-skeleton Extraction of Human-scanned Point Clouds - Application in Upright Human Pose Estimation.

DOI: 10.5220/0005298300540060

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 54-60

ISBN: 978-989-758-090-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

cian smoothing. They used connectivity surgery to

preserve the original topology and an iterative pro-

cess as an energy minimization problem with con-

traction and attraction terms. Though special atten-

tion must be paid to the contraction parameters, the

method fails in the case of very coarse models. Wa-

tertight meshes are required due to the mesh connec-

tivity constraint from the Laplacian smoothing. Cao

et al. (Cao et al., 2010) extended their work to point

cloud datasets. To do so, a local one-ring connec-

tivity of point neighborhood must be done for the

Laplacian operation, which signiﬁcantly increases the

processing time. An alternative approach proposed

by Tagliasacchi et al. (Tagliasacchi et al., 2009) uses

recursive planar cuts and local rotational symmetric

axis (ROSA) to extract the curve-skeleton from in-

complete point clouds. However, their approach re-

quires special attention within object joints and it is

not generalizable to all shapes. Similarly, Sam et

al. (Sam et al., 2012) use the cutting plane idea but

only two anchors points must be computed. They

guarantee centeredness by relocating skeletal nodes

using ellipse ﬁtting. However, although the result-

ing curve-skeletons from this work preserve most of

the required properties cited by Cornea et al. (Cornea

et al., 2007), they are impractical when real-time is

required. We herein overcome this limitation by com-

puting the skeletal candidates in the 2-D space.

3 PROPOSED APPROACH

In the following we introduce our approach to extract

the curve-skeleton from upright human-scanned point

clouds. Our major contribution, in which we address

the real-time constraint, is based on the extraction of

the skeletal candidates in the 2-D space. We have

been inspired by image processing techniques used

in silhouette-based human pose estimation (Li et al.,

2009). The ﬁnal 3-D curve-skeleton results from back

projecting these 2-D skeletal candidates to their right

location in the 3-D space.

Let us consider a scanned point cloud P repre-

sented in a three-dimensional Euclidean space E

≡

{p(x,y, z)|1 ≤ x ≤ X,1 ≤ y ≤ Y,1 ≤ z ≤ Z}, and de-

scribing a set of 3-D points p

representing the under-

lying external surface of an upright human body. The

ﬁrst step concerns the voxelization of the given point

cloud to account for the point redundancy resulting

from registering multi-view depth data. To that end,

we have considered the surface voxelization approach

presented in (Garcia and Ottersten, 2014). The result-

ing point cloud P

presents both a uniform point den-

sity and a signiﬁcantly reduction of the points to be

further processed. Our goal is to build a 2-D image

representing the front view of the 3-D body in order

to extract its curve-skeleton using 2-D image process-

ing techniques. To do so, we deﬁne a cutting-plane π

being parallel to x and y-axis of E

and intersecting

p, the mean point or centroid of the point set P

i.e.,

p =

∑

i=1

. From our experimental results,

π crossing

p provides good body orientations in the

case of upright body postures, e.g., walking, running

or working. Alternative body conﬁgurations are dis-

cussed in Section 4.1. We determine the body orienta-

tion by ﬁtting a 2-D ellipse onto the set of 3-D points

lying on π, i.e., ∀p

(z) =

p(z). We note that in prac-

tice and due to point density variations, 3-D points

are not necessarily lying on π. Therefore, we con-

sider those 3-D points in P

with p

(z) ∈ [

p(z) ± λ],

and λ being related to the point density of P

, as

depicted in Fig. 1a. We note that λ has to be cho-

sen large enough to ensure a good description of the

body contour. In turn, the higher the point density

the smaller the λ value should be. Fig. 1b illus-

trates the resulting 2-D ellipse using the Fitzgibbon

et al. (Fitzgibbon and Fisher, 1995) approach, imple-

mented in OpenCV (Bradski and Kaehler, 2008). r

and s are unitary vectors along the major and minor el-

lipse axes, respectively. c is the ellipse centroid. Once

the body orientation is known, we build the 2-D image

I representing the front view of the human body. To

do so, we can project P

to the 2-D plane π

deﬁned

by the ellipse major axis r and with normal vector

s. Image dimensions are given by the projections of

the minimum p

min

= (x

min

) and maximum

max

= (x

max

) 3-D point coordinates from

. An alternative solution is to translate P

from

the coordinate frame C

= (r, s, r × s), and origin

at p

max

, depicted in Fig. 2a, to the coordinate frame

= (−r,−(r × s), −s), and origin at (0,0,0), de-

(a) (b)

Figure 1: (a) Selected 3-D points (in green color) for el-

lipse ﬁtting. (b) Fitted ellipse using OpenCV (Bradski and

Kaehler, 2008).

Real-timeCurve-skeletonExtractionofHuman-scannedPointClouds-ApplicationinUprightHumanPoseEstimation

(a) P

(b) P

Figure 2: (a) P

at C

. (b) P

at C

picted in Fig. 2b, i.e., P

= T

· P

. By doing so, the

2-D plane π

coincides with the two-dimensional Eu-

clidean space with x and y axes coincident to E

, i.e.,

≡ {I(m,n)|1 ≤ m ≤ M,1 ≤ n ≤ N}. Therefore,

I(m,n) =



255 if p

∈ P

∀

0 otherwise,

(1)

with m = δ · round



(x)



and n = δ · round



(y)



and δ the conversion factor between metric units and

pixels (herein we set δ = 100, i.e., 1 mm = 1 pixel).

Fig. 3a shows the resulting image I after projecting

onto E

In 2-D image processing, distance ﬁeld or dis-

tance transform (DT) is commonly used to build im-

age maps D from binary images. Pixel values indicate

the minimum distance between the selected pixel and

its nearest boundary pixel, i.e., the closest zero pixel.

That is, D(u) = min{d(u,v)|I(v) = 0}, with d(u,v)

being the Euclidean, Manhattan or Chessboard dis-

tance metric. We note that the resulting image map

is highly sensitive to zeros pixels, and thus holes in

the image will signiﬁcantly alter the resulting image

map. A valid approach to ﬁll the body silhouette in

the case of a constant density of points and without

omission of data is the use of morphological operators

such as dilation and erosion operators. However, ﬁll-

ing the body silhouette becomes more intricate when

treating incomplete point clouds with large portions

of data missing, e.g., around the pelvis in Fig. 3a. We

therefore propose to extract the contours of I and ﬁll

the area bounded by the most external contour, i.e.,

the body silhouette, which results in a dense body sil-

houette, as shown in Fig. 3c. To do so, we use the

contour following algorithm proposed by Suzuki et

al. (Suzuki and Abe, 1985) (see Fig. 3b). Among the

aforementioned distance metrics, we herein have con-

sidered the Euclidean DT described in (Felzenszwalb

and Huttenlocher, 2004). From the resulting image

map D in Fig. 3d, we realize that higher-valued map

pixels correspond to centered pixels within the sil-

houette boundary and thus, skeletal candidates. In-

deed, skeletal candidates coincide with low-valued

edges on the result of the Laplace operator on D,

i.e., ∆D = ∂

D/∂

m + ∂

D/∂

n, as can be observed

in Fig. 3e. The ﬁnal 2-D coordinates of the skeletal

candidates J are given by an adaptive thresholding on

the low-valued edges in ∆D, i.e.,

J(u) =



255 if ∆D(u) < τ

0 otherwise,

(2)

being τ the adaptive threshold value. A valid τ

value can be automatically obtained using Otsu’s

method (Sezgin and Sankur, 2004). The pixel posi-

(a) I. (b) (c)

(d) D. (e) ∆D. (f) J.

Figure 3: (a) I from (1). (b) Contours of I. (c) Body silhou-

ette. (d) Euclidean DT D. (e) Laplace operator ∆D. (f) J

from (2).

tions of the skeletal candidates from (2) correspond to

the x and y coordinates of the 3-D points q

that will

constitute the ﬁnal curve-skeleton J , as depicted in

Fig 4, i.e., q

(x) = m and q

(y) = n ∀J(m, n) = 255.

We determine the missing q

(z) coordinate for each

skeletal candidate J(u

) by deﬁning the line l

with

unit vector the z-axis of E

and intersecting at the cor-

responding skeletal candidate J(u

). The missing co-

ordinate corresponds to the middle point between the

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

Figure 4: (a) Skeletal candidates J (in blue color) and the

location of a skeletal candidate J(u

) in the 3-D space, i.e.,

(z).

lower p

and upper p

intersected points on the exter-

nal contour of the selected slice, also computed using

the contour following algorithm in (Suzuki and Abe,

1985), i.e., q

(z) = p

(z) +



(z) − p

(z)



/2. The ﬁ-

nal 3-D curve-skeleton J results from projecting back

the 3-D points q

from C

to C

, i.e., J = T

· J =

)

−1

· J .

4 EXPERIMENTAL RESULTS

In the following we evaluate the proposed curve-

skeleton extraction approach on both real and syn-

thetic data. In addition, we present a visual

comparison against the curve-skeleton extraction

via Laplacian-based contraction approach presented

in (Cao et al., 2010), and the runtime analysis. All

reported results have been obtained using a Mobile

Intel



QM67 Express Chipset with an integrated

graphic card Intel



HD Graphics 3000. The pro-

posed approach has been implemented in C++ lan-

guage using the OpenCV (Bradski and Kaehler, 2008)

and PCL (PCL, 2014) libraries.

Four different 3-D models have been consid-

ered to evaluate the proposed curve-skeleton extrac-

tion approach, i.e., the standing Bill from the V-

Rep repository (VRE, 2014), and the Nilin Combat,

Iron Man and Minions, from the publicly available

TF3DM

repository (TF3, 2014). The vast majority

of available 3-D models are built as textured polyg-

onal models, being ﬂexible and easily to be rendered

by computers. However, the number of polygons is

far below the number of 3-D points of a real scanned

object; mainly within regions with uniform geome-

try. Consequently, we have transformed the represen-

tation of the considered 3-D models from polygonal

meshes to real 3-D point sets. To do so, we have

simulated a multi-view sensing system composed of

4 RGB-D cameras using V-Rep (VRE, 2014), a very

versatile robot simulator tool in which the user can

replicate real scenarios, as shown in Fig. 5. Synthetic

Figure 5: Experimental environment using V-Rep.

data results from the registration of the depth data ac-

quired by each virtual RGB-D camera.

Fig. 6 shows the scanned 3-D models (in purple

color), after data registration and surface voxeliza-

tion, and their respective extracted curve-skeletons.

We note that the resulting curve-skeletons present the

same topology and geometry as the 3-D object; being

unit wide and centrally placed within the underlying

scanned surface.

We show, in Fig. 7, a visual comparison against

Cao et al. approach (Cao et al., 2010). One of the

limitations of Cao et al. approach is the need of

1-ring neighborhood to perform the Laplacian-based

contraction. As a result, their approach fails in the

presence of large portions of data missing, as occurs

around the pelvis in Fig. 7c and Fig. 7f. Furthermore,

contraction and attraction weights must be manu-

ally tunned in order to obtain a unit thickness curve-

skeleton. Herein, we have considered the default set-

tings provided by the authors. We also note that the

consumption time to extract each curve-skeleton is

above 1 minute.

Table 1: Running time to extract the curve-skeleton of the

3-D models presented in Fig. 6 (units are in ms).

3-D Model

Running time

Orientation Silhouette D ∆D p

(z) Total

Standing Bill

2.5 0.2 0.3 0.0 193.0 252.2

(23650 points)

Nilin Combat

1.6 0.0 0.1 0.1 159.6 206.0

(18730 points)

Iron Man

2.7 0.1 0.4 0.2 390.6 473.0

(33358 points)

Minions

1.5 0.3 0.1 0.2 105.4 150.8

(18426 points)

Man 1

1.3 0.7 0.3 0.1 232.7 279.5

(18053 points)

Man 2

0.9 0.4 0.4 0.1 287.5 341.6

(21263 points)

Woman 1

0.4 0.1 0.3 0.1 109.3 142.1

(12986 points)

Real-timeCurve-skeletonExtractionofHuman-scannedPointClouds-ApplicationinUprightHumanPoseEstimation

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

(m) (n) (o) (p)

Figure 6: Curve-skeleton extraction from synthetic object-scanned point clouds. 1

row, Standing Bill (23650 points). 2

row, Nilin Combat (18730 points). 3

row, Iron Man (33358 points). 4

row, Minions (18426 points).

Real data has been generated from a multi-view

sensing system composed of 2 consumer-accessible

RGB-D cameras, i.e., the Asus Xtion Pro Live cam-

era, with opposed ﬁeld-of-views, i.e., with no data

overlapping. The relationship between the two cam-

eras was determined using the stereo calibration

implementation available in OpenCV (Bradski and

Kaehler, 2008) and a transparent checkerboard (black

squares are visible from both sides). Better regis-

tration approaches based on ICP, bundle adjustment

or the combination of both can also be considered.

However, it is shown in Fig. 8 that the current ap-

proach perfectly extracts the curve-skeleton on such a

coarse registered point clouds, handling large portions

of missing data as well as registration inaccuracies.

Fig. 8 presents 3 coarse registered point clouds of two

mans and one woman. Note that despite the large por-

tions of data missing due to self occlusions and inac-

curate registration of both views, our approach is able

to extract an accurate curve-skeleton that preserves

both topology and geometry of the scanned model.

Table 1 reports the running time to extract the

curve-skeleton of the 3-D models presented in Fig. 6

and Fig. 8. We note that we have reported CPU-

based values without using data-parallel algorithms or

graphics hardware (GPU). Most of consumption time

is invested on determining the missing 3-D coordi-

nate of each skeletal candidate whereas 2-D process-

ing time is almost negligible, as it was expected.

4.1 Limitations and Future Work

The resulting curve-skeleton depends strongly on the

DT computed from the front view of the human body.

Therefore, occluded body parts due to crossing arms

or legs will not be considered and can provide wrong

skeletal candidates. Furthermore, having one or both

arms too close to the torso will make DT to consider

them as part of the torso. Similarly, if both legs are too

close, they can be considered as a single one. We plan

to solve these body conﬁgurations by using both tem-

poral and RGB information. Indeed, we herein pro-

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n) (o) (p) (q) (r)

Figure 7: Visual comparison against curve-skeleton extraction via Laplacian-based contraction (Cao et al., 2010). 1

row,

Standing Bill. 2

row, Nilin Combat. 3

row, Iron Man. 1

and 4

col., Input dataset. 2

and 5

col. Resulting

curve-skeleton using our approach. 3

and 6

col. Resulting curve-skeleton using (Cao et al., 2010).

pose a very fast approach to estimate a coarse curve-

skeleton that will be the basis to ﬁt a human model

skeleton. We plan to use a progressive ﬁtting of the

skeleton model starting from the torso limb. In gen-

eral, the torso limb corresponds to the lowest-valued

edges from the Laplace operation output.

5 CONCLUDING REMARKS

A real-time approach to extract the curve-skeleton of

an upright human-scanned point cloud has been de-

scribed. Our main contribution is in estimating the

2-D skeleton candidates using image processing tech-

niques. By doing so, we address the real-time con-

straint. The ﬁnal curve-skeleton results from locating

the previous computed 2-D skeleton candidates in the

3-D space. The resulting curve-skeleton preserves the

centeredness property as well as the same topology

and geometry as the 3-D model. Future work includes

the treatment of alternative body conﬁgurations than

upright. The use of temporal and RGB information

will be also investigated in order to increase the ro-

bustness of the approach.

ACKNOWLEDGEMENTS

This work was supported by the National Re-

search Fund, Luxembourg, under the CORE project

C11/BM/1204105/FAVE/Ottersten.

Real-timeCurve-skeletonExtractionofHuman-scannedPointClouds-ApplicationinUprightHumanPoseEstimation

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Figure 8: Curve-skeleton extraction from upright human-scanned point clouds. 1

row, Man 1 (18053 points). 2

row, Man 2

(21263 points). 3

row, Woman 1 (12986 points).

REFERENCES

(2014). Point Cloud Library (PCL). http://pointclouds.org/.

(2014). TF3DM

. http://tf3dm.com/.

(2014). Virtual robot experimentation platform (v-rep).

http://www.coppeliarobotics.com/.

Au, O. K.-C., Tai, C.-L., Chu, H.-K., Cohen-Or, D., and

Lee, T.-Y. (2008). Skeleton extraction by mesh con-

traction. In ACM SIGGRAPH 2008 Papers, pages

44:1–44:10. ACM.

Bradski, G. and Kaehler, A. (2008). Learning OpenCV:

Computer Vision with the OpenCV Library. O’Reilly

Media, 1st edition.

Cao, J., Tagliasacchi, A., Olson, M., Zhang, H., and Su, Z.

(2010). Point Cloud Skeletons via Laplacian Based

Contraction. In Shape Modeling International Con-

ference (SMI), pages 187–197.

Cornea, N., Silver, D., and Min, P. (2007). Curve-skeleton

properties, applications, and algorithms. IEEE Trans-

actions on Visualization and Computer Graphics,

13(3):530–548.

Felzenszwalb, P. F. and Huttenlocher, D. P. (2004). Dis-

tance transforms of sampled functions. Technical re-

port, Cornell Computing and Information Science.

Fitzgibbon, A. and Fisher, R. B. (1995). A buyer’s guide to

conic ﬁtting. In In British Machine Vision Conference,

pages 513–522.

Garcia, F. and Ottersten, B. (2014). CPU-Based Real-Time

Surface and Solid Voxelization for Incomplete Point

Cloud. In IEEE International Conference on Pattern

Recognition (ICPR), pages 2757–2762.

Li, M., Yang, T., Xi, R., and Lin, Z. (2009). Silhouette-

based 2d human pose estimation. In International

Conference on Image and Graphics (ICIG), pages

143–148.

Sam, V., Kawata, H., and Kanai, T. (2012). A robust

and centered curve skeleton extraction from 3d point

cloud. Computer-Aided Design and Applications,

9(6):969–879.

Sezgin, M. and Sankur, B. (2004). Survey over image

thresholding techniques and quantitative performance

evaluation. Journal of Electronic Imaging, 13(1):146–

168.

Suzuki, S. and Abe, K. (1985). Topological structural anal-

ysis of digitized binary images by border following.

Computer Vision, Graphics, and Image Processing,

30(1):32–46.

Tagliasacchi, A., Zhang, H., and Cohen-Or, D. (2009).

Curve skeleton extraction from incomplete point

cloud. In ACM SIGGRAPH 2009 Papers, SIGGRAPH

’09, pages 71:1–71:9. ACM.

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications