2 RELATED WORK
Background subtraction techniques for human de-
tection rely on the static background information or
on the motion information, to first detect the regions
that have changed or moved. Beleznai etal (Beleznai
et al., 2004) perform a Mean shift clustering of the
subtracted image to identify regions of significant
change. The clustered regions are then checked for
the presence of humans by fitting a simple human
model in terms of three rectangles. Eng and etal (Eng
et al., 2004) propose a similar technique where the
local foregrond objects are detected using clustering,
and an elliptical model is used to represent the
humans. A Bayesian framework is then employed to
estimate the probability of the presence of a human
after fitting the ellipses to a foreground object region.
Elzein and etal (Elzein et al., 2003) propose a vision
based technique for detecting humans in videos. A
localized optic flow computation is performed to
compute the locations that have undergone significant
amount of motion. Haar wavelet features at different
scales are extracted from these localised regions, and
matched against that of templates using a linear clas-
sifier. Lee and etal (Lee et al., 2004) use differential
motion analysis to subtract the current input image
from a reference image and thus extract the contour
of the moving object. A curve evolution technique
is then performed to remove the redundant points
and noise on the contour. The curve thus extracted
is matched against existing templates by calculating
the Euclidean distance of the turn angles at the points
describing the curve.
Dalal and etal (Dalal. Navneet and Schmid, 2006)
propose a technique for detecting the humans based
on oriented histrograms of flow and appearance.
Optic flow is computed between successive frames
and the direction of flow is quantized. The histogram
constructed based on these directions of flow, coupled
with the histograms of oriented gradients are used to
train a linear SVM to detect the presence of humans.
Bertozzi and etal (Bertozzi et al., 2005) describe a
system for pedestrian detection using stereo infrared
images. Warm areas from the images are detected
and segmented. An edge detection operation is
performed on the resulting regions, followed by a
morphological expansion operation. Different head
models are then used to validate the presence of
humans in the resulting images. Researchers have
proposed similar algorithms (Zhou and Hoang, 2005;
Han and Bhanu, 2005) to detect humans using either
motion information or static background information.
But all of these algorithms, have stopped at detecting
whether the moving object is a human or not. We go
a step further and also provide information about the
high level pose of the person by indicating whether
it is a frontal, back or profile view of the person. As
mentioned before, estimating the pose of the person
will help an individual to decide if the person at the
door is waiting for them to respond or not.
We employ a background subtraction technique,
enhanced by graph cut algorithm to segment the fore-
ground regions. Silhouettes thus obtained are used
to extract features like shape context and fourier de-
scriptors. These are then used to train a classifier to
distinguish between the profile and non profile views.
Though many researchers use synthetically generated
human silhouettes for testing the pose estimation al-
gorithms, we have used real segmented silhouettes for
evaluating our pose estimation algorithm.
3 SEGMENTATION
This section deals with the 3 stage process of ex-
tracting the silhouette of the foreground objects.
A running average model is based on technique
proposed by Wren etal in (Wren et al., 1997)pic-
cardi:04, where the background is independently
modeled at each pixel location is used to model the
background. A Gaussian probability density function
(pdf) that fits the pixel’s last n values is computed.
A running average is computed to update the pdf.
Often, even with these models, the shadow regions
get misclassified as foreground. Assuming that the
illumination component of the pixel locations in the
shadow region undergo uniform change, we use the
derivatives at these locations to cancel out this unform
change. The derivatives of the pixel locations in the
shadow region for both the background model and
the current frame should be very similar. Thus the
difference in the derivatives can help in eliminating
to a certain extent the shadow regions.
Robust background subtraction can still result
in broken contours and blobs of the foreground
object. It is difficult to detect whether the foreground
object is a human using these blobs. Ideally we
would like to have a continuous contour that can
be further processed. To obtain this continuous
contour we have used the extended version of graph
cut algorithm (Boykov and Jolly, 2001) proposed
by Rother etal (Rother et al., 2004) for color image
segmentation. Though this approach guarantees an
optimal segmentation solution, given the constraints,
the drawback is that the seed or the trimap(the initial
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
138