3 PROPOSED FACE DETECTION
ALGORITHM
As shown in Figure 1, the input patient image un-
dergoes a sequence of processing events. The face
detection task is treated as a combined result from
a color–based skin detector, pattern matching–based
face detector and a template matching–based eye de-
tector. The system takes an input image and cleans up
the background using skin detection, this processed
image is then fed into the face detection algorithm.
Using this combination the system needs to find the
face only within the detected skin region. To fur-
ther improve face detection accuracy, eye detection
is used. After the skin detection step, eye detection
is performed, this gives us eye coordinates. This out-
put is fed into the face detection algorithm and then
pattern matching is done with the already known eye
coordinates. The system is hence a combination of
interlocked algorithms where the feedback from eye
detection is used to improve the performance of the
face detection algorithm. Each component of this en-
semble system is explained here.
Referring back to Figure 1, first, skin detection
is done. The skin detection algorithm of Peer et
al. (Peer et al., 2003) is used, modified slightly
so that max(R, G, B) − min(R,G,B) > 10 rather than
max(R, G, B) − min(R, G, B) > 15 to allow for more
variation in ethnicity, obstruction by dirt and blood,
etc.
As mentioned, skin detection allows us to elimi-
nate the background and focus on those regions that
are most likely to contain a face, which is done with
the face detection algorithm in (Pai et al., 2006) that
uses the height to width ratio of the skin region and
the presence and location of eyes and a mouth. If any
of these features are present in the image the algo-
rithm computes a bounding rectangle around the face.
We relaxed the maximum allowed height to width ra-
tio from 1.75 to 1.8 to allow for more variation. The
eye localization algorithm uses color histograms in
the Y
′
C
B
C
R
color space. Likewise for mouth local-
ization.
For eye detection, we implemented a template-
based eye detection algorithm given that many of
the patients have their eyes closed, thus confound-
ing color-based approaches. Fifty templates were ex-
tracted from images not used in our evaluation. Some
of them are shown in Figure 2. For matching we
use two different methods and combine the results.
The first is the normalized sum of squared differ-
ences (SSD) between the template and image patches;
the second is the normalized cross correlation (NCC).
NCC is useful when lighting conditions vary, as they
do in our images. Equation 2 and Equation 3 give
expressions for SSD and NCC, respectively.
ssd(u, v) = Σ
x,y
( f (x, y) −t(x− u, y− v))
2
(2)
ncc(u, v) =
1
n− 1
Σ
x,y
( f (x, y) −
¯
f)(t(x, y) −
¯
t)
σ
f
σ
t
(3)
In the above equations, n is the number of pixels,
f is the image, t is the template, and the summation is
over positions x, y under the template at position u, v.
σ is the standard deviation.
(a) (b) (c)
Figure 2: Example eye templates.
Because patients can be, and are, in many differ-
ent orientations in our images, we rotate each eye
template and use the rotated templates as indepen-
dent templates for comparison. The rotation is done
around the center of the eye template in 30 degree in-
crements, producing twelve different templates from
each original. Other transformations, such as scal-
ing, were tested, but had little impact on the results.
Empirically, the template-based method works better
than the built-in method of (Pai et al., 2006), so to im-
prove face localization we first locate the eyes using
templates and then run the face detection algorithm.
This system being an ensemble of algorithms,
treats the face as a combination of different features
on a given skin area, whereas the Viola-Jones ap-
proach, the system is looking for a complete set of
Haar-like features that match the face. Hence, if there
is a partial face or a face with a different orientation,
our system performs considerably better than the stan-
dard face detection algorithms that are trained spe-
cially for full upright straight face images.
4 EXPERIMENTAL RESULTS
Recall that our problem domain is images of pa-
tients taken during disaster events. Much of our data
came from drills conducted by the Bethesda Hospi-
tals’ Emergency Preparedness Partnership (BHEPP),
which is a collaboration between the National Navel
Medical Center, the National Institutes of Health
Clinical Center, and the Suburban Hospital in
Bethesda, Maryland. One of the goals of BHEPP is
to develop a transportable, proven model for defining
VISAPP 2011 - International Conference on Computer Vision Theory and Applications
328