Triangle or pyramidal templates were presented
in
(Choi et al. 1998) and (Yilmaz and Shah 2002), where
these methods perform pose estimation using a
classification system based on template deformation
or orthogonal projections of the detected points. Our
method focuses on the efficiency of detecting pose
by using a small number of feature points (three) and
by directly determining pose from two of the angles
of the triangular template.
The following simplifying assumptions were
made for easily training feature detectors: (a) facial
expression is assumed to be calm with the eyes
open; (b) there are no facial occlusions, such as hats,
glasses, facial hair, etc. Violation of these
assumptions would make the detection of eyes and
mouth more difficult, but once these facial features
are detected, the geometric pose estimation would
still work. The final assumption is that both eyes are
visible, which limits the pose angle to ±30 degrees.
The next sections cover the detection of the facial
features used, present the head model and discuss
the results.
3 FACIAL FEATURE
DETECTION
The first step in the detection of facial features is
done by detecting the face using a face detection
method and narrowing the search space of the facial
feature detectors based on the skin region. Once a
face is detected, the skin detection method selected
to narrow the search space is a lookup table based on
pixel probability distributions in the YCbCr color
space. The most notable challenge with skin
detection is due to lighting conditions
(Storring et al.
1999)
, (Gong et al. 2000). The luminance portion of
the image is ignored here, which makes the process
more resilient to lighting effects.
With pose estimation, facial features undergo
deformations as the pose angle changes. This and the
variability in facial proportions from person to
person, make the feature detection problem
challenging. Previous approaches have involved the
segmentation of the facial image and then the use of
shape detectors to determine the location of the
features
(Storring et al. 1999). Areas of high change
are often related to facial features
(Fitzpatrick 2003),
which motivates using edges in a facial image or
looking at changes in pixel intensities.
Artificial Neural Networks (ANNs) are used here
for the detection of facial features from grayscale
images. An eye detection network was trained using
2760 eye images and 13800 non-eye images of size
21x11. The training images for the eye network
consisted of both left and right eye images at varying
poses up to 30 degrees. A mouth detection network
was trained using 1430 mouth images and 7150 non-
mouth images of size 33x13. The eye network
provides accuracy of 90%, while the mouth network
results in accuracy of 93%.
During feature detection, the image is scaled
based on the results from the skin detection process.
The grayscale image is then subjected to contrast
stretching for normalization. Each detector network
is applied through the skin area of the image using a
sliding window and produces an output result
between 0 and 1 for each pixel. The higher the
output value, the more likely the subimage centered
at the pixel of interest contains a feature. Two
weight maps are generated for the location of eyes
and the mouth respectively. An averaging filter is
passed over the maps to reduce the effects of
outliers. Local maxima indicate likely locations of
the eyes and mouth.
Each combination of likely feature points is
checked to make sure they adhere to template
limitations. Finally, the set with the greatest total
weight is selected as the correct position for the
facial template.
4 POSE ESTIMATION
A simplified head model considered for fast pose
estimation is shown in Figure 1. Note that the eyes
and mouth are used as the primary feature points.
The head itself is treated as a spherical object of
radius r which rotates upon the y-axis. The mouth
and eyes are treated as a vertical plane on this
sphere. The distance between the eyes is labeled d.
By projecting the model of the head onto a two
dimensional plane, Equations (1), (2) and (3) were
developed to describe the change in the triangular
template angles,
1