2 RELATED WORK
There are mainly two categories of methods to
locate hands in a video sequence. One of them is
based on skin region detection. The approach in
(Hasanuzzaman, 2004) detects skin region in YUV
color space, and uses x coordinate to extinguish left
and right hands. They classify the extracted hands
locations into 8 command gesture patterns in order
to operate a robot. An approach based on region
SVM learning is proposed in (Han, 2006), which
automatically segment skin regions out of a video
frame and assume the largest three regions as head
and hands. Both of these approaches are effective
locating two hands under the condition that skin
regions are isolated to each other. In order to
distinguish two hands when they get close, several
attempts are made. In (Lee and Cohen, 2006), blob
merging technique is introduced to locate the
forearm (therefore locate the hand) in static images.
The skin regions of forearm and hand are modeled
as ellipses under the assumption that user wears
shorts. Askar et al. (Askar, 2004) tried to handle the
situation with contact between hands. The approach
depends on row and column histograms analysis of
the mask binary image, therefore they may fail due
to the noises derived from background suppression.
The other category is based on object tracking
techniques. Meanshift(Comaniciu and Ramesh,
2000) and Camshift(Bradski, 1998) are widely used
in tracking single object. However these two
methods could not handle the situation of multiple
objects (two hands and probably head). The methods
will lose object or mislead to track inappropiate
object when objects are getting close to each other.
In (Vacavant and Chateau, 2005) particle filter is
used to find proper positions of head and hands. It
performs at 6 frame per second, which is not suitable
for real-time application. Hand occlusion is the main
challenge in hands tracking, because not only the
overlapping often occurs, but also the hand size and
the hand orientation are both changing during
occlusion. Many systems like (Schreer, 2005),
(Kirubarajan and BarShalom, 2001), (Coogan, 2006)
and (Imagawa, 1998) will warn the user to separate
hands when occlusion is detected and the systems
will restart tracking after then.
Real-time hands locating, especially when
occlusion happens, is still a tempting research area.
In our system, we firstly generate PSM with three
aspects of hand information, and then apply tracking
procedure on the PSM.
3 POSSIBILITY SUPPORT MAP
GENERATION
For each frame, a multi-channel PSM is generated to
support the further tracking procedure. The channel
number of PSM is set to the quantity of objects
which are being tracked. In our case, we track three
objects: two hands as well as head. Each channel of
the map is generated from original image and every
pixel in it represents the possibility of coresponding
object. The possibility that a pixel supports the
object is calculated by combining color information,
position information and motion information.
3.1 Color Information
Although colors captured by the camera are in RGB
mode, they are converted into HSV mode, since
color footprint is more distinguishable and less
sensitive to illumination changes in the hue
saturation space. Therefore color C could be
represented by (C
h
, C
s
). For the pixel in color C, the
possibility of being in the skin region is denoted as
p(skin|C) and calculated by equation 1which is
derived from Bayesian Equation.
)(
)()|(
)|(
Cp
skinpskinCp
Cskinp
(1)
In our situation, the assumption that every pixel
has the same possibilities p(C), p(skin) and p(C|skin)
is made, so that following equations can be used.
t
i
i
t
i
i
n
skinn
t
t
t
N
skinN
skinp
1
1
)(
)(
)(
(2)
t
i
i
t
i
i
n
Cn
t
t
t
N
CN
Cp
1
1
)(
)(
)(
(3)
t
i
i
t
i
i
skinn
skinCn
t
t
t
skinN
skinCN
skinCp
1
1
)(
)|(
)(
)|(
)|(
(4)
Where n
i
(C|skin) is the number of skin region
pixels in color C; meanwhile n
i
(C), n
i
(skin) and n
i
are respectively the number of pixels in color C, the
number of skin region pixels and total pixel number
for the i
th
frame.
REAL-TIME HAND LOCATING BY MONOCULAR VISION
407