Figure 1: A low accurate segmentation result using a skin
color based approach (Jones and Rehg, 2002). The white
pixel on the right binary are the pixels detected as hand
points.
ously marked on an input image. The pixels inside
the marked region are used to build a color model
that will be used to segment the rest of the hand by
means of color similarity. Following this idea, Yuan
et al. (Yuan et al., 2008) proposed an algorithm that
makes up color clusters using a training region and
then labels the clusters as hand or background de-
pending on the size of each cluster. Finally, the image
points are classified depending on what cluster they
belong to. The problem with this algorithm is that it
requires the user to mark the training region manually,
an undesired task in automatic environments.
To get a training region automatically, we could
locate the hand on the input image. Using this loca-
tion we could mark an appropriate hand region, com-
monly using the location coordinates as the center of
such a region.
For the localization problem, the Viola and Jones
detector (Viola and Jones, 2002) may be used. The
problem with this approach is that it requires large
training image collection and it is time consuming for
the training. Because a hand is a simple object with
a well defined shape, an expensive training stage is
unnecessary. Moreover, the localization problem may
be carried out using a local structure-based approach
exploiting not only locality information but also the
structure of the hand shape.
Our contribution in this work is to present a very
accurate hand segmentation technique composed of
two main steps: (1) estimate the hand location on
an image, and (2) separate the hand region from the
background. For the localization stage, we use a lo-
cal structure-based approach exploiting both struc-
tural and locality information of a hand. Structural
information is related to the components forming a
hand and locality information is related to the spatial
relationship between these components. To this end,
we use the STELA (StrucTurE-based Local Aproach)
method proposed by Saavedra et al. (Saavedra et al.,
2011). For the segmentation stage we extend the idea
of Yuan et al. (Yuan et al., 2008) proposing strategies
to compute the underlying parameters. In this case,
we make up color clusters from a training region ob-
Figure 2: A example of hand segmentation using our ap-
proach. The blue contour defines the segmented region.
tained directly from the localization stage (a manual
localization is no longer required). For color repre-
sentation we use only the chromatic channels of the
L*a*b* color space as suggested by Yuan et al. (Yuan
et al., 2008). The segmentation stage ends with a
post-procesing phase to reduce imperfections caused
by noise. An example of our results is shown in Fig-
ure 2 where the segmentation is specified by a blue
contour.
The remaining part of this document is organized
as follows. Section 2 describes the local structure
based approach (STELA) which our proposed method
relies on. Section 3 describes in detail the hand seg-
mentation process. Section 4 presents the experimen-
tal evaluation, and finally, Section 5 discusses some
conclusions.
2 STELA
STELA is a structure-based approach proposed by
Saavedra et al. (Saavedra et al., 2011) for retrieving
3D models when the query is a line-based sketch. A
STELA descriptor is invariant to translation, scaling,
and rotation transformation. The main property of
this approach is that it is based not only on the struc-
tural information but also on the locality information
of an image.
For getting structural information, an image is
decomposed into simple shapes. This method uses
straight lines as primitive shapes which are named
keyshapes. For getting locality information, local de-
scriptors taking into account the spatial relationship
between keyshapes are used. An interesting property
of keyshapes is that these allow us to represent an ob-
ject in a higher semantic level.
STELA consists of the following steps: (1) get an
abstract image, (2) detect keyshapes, (3) compute lo-
cal descriptors, and (4) match local descriptors.
1. Abstract Image. The abstract image allows us to
reduce the effect of noise, keeping only relevant
edges. To this end, STELA applies the canny op-
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
322