region and excludes the right and left parts of this re-
gion. The information in this area comes from the eye
itself and not from the eyebrow or the eyeglasses.
Since, at the actual eye center position, there is
significant luminance variation along the horizontal
and vertical axes, the images D
x
(x, y) and D
y
(x, y) of
the absolute discrete intensity derivatives along the
horizontal and vertical directions are evaluated:
D
x
(x, y) = |I(x,y) − I(x− 1,y)| (4)
D
y
(x, y) = |I(x,y) − I(x, y− 1)| (5)
The contents of the horizontalderivativeimage are
subsequently projected on the vertical axis and the
contents of the vertical derivative image are projected
on the horizontal axis. The 4 vertical and 4 horizontal
lines, corresponding to the 4 largest vertical and hori-
zontal projections(i.e., the lines crossing the strongest
edges) are selected. The point whose x and y coordi-
nates are the medians of the coordinates of the vertical
and horizontal lines respectively, defines an initial es-
timate of the eye center (Figure 5(a)).
Using the fact that the eye center is in the mid-
dle of the largest dark area in the region, the previous
result can be further refined: The darkest column (de-
fined as the column with the lowest sum of pixel in-
tensities) of a 0.4N
eye
pixels high and 0.15M
eye
pixels
wide area around the initial estimate is found and its
position is used to define the horizontal coordinate of
the refined eye center. In a similar way, the darkest
row in a 0.15N
eye
x0.4M
eye
area around the initial esti-
mate is used to locate the vertical position of the eye
center (Figure 5(b)).
For even more refined results, in a 0.4N
eye
xM
eye
area around the point found at the previous step, the
darkest 0.25N
eye
x0.25M
eye
region is searched for, and
the eye center is considered to be located in the middle
of this region. This point gives the final estimate of
the eye center, as can be seen in figure 5(c).
(a) (b) (c)
Figure 5: (a)Initial estimate of eye center (b) estimate after
first refinement, (c) final eye center localization.
4.2 Mouth Corner Localization
For mouth corner localization, the hue component of
mouth regions can be exploited, since the hue values
of the lips are distinct from those of the surrounding
area. More specifically, the lip color is reddish and,
thus, its hue values are concentrated around 0
o
. In or-
der to detect the mouth corners, the pixels of the hue
component are classified into two classes through bi-
narization (Otsu, 1979). The class whose mean value
is closer to 0
o
is declared as the lip class. Small com-
ponents assigned to the lip class (while they are not
lip parts) are discarded using a procedure similar to
the light reflection removal procedure.
Afterwards, the actual mouth corner localization
is performed by scanning the binary image and look-
ing for the rightmost and leftmost pixels belonging to
the lip class.
5 EXPERIMENTAL EVALUATION
PROCEDURE
The proposed method has been tested on the
XM2VTS database (Messer et al., 1999), which has
been used in many facial feature detection papers.
This database contains 1180 face and shoulders im-
ages. All images were taken under controlled lighting
conditions and the background is uniform. The data-
base contains ground truth data for eye centers and
mouth corners.
Out of a total of 1180 images, only 3 faces failed
to be detected. In cases of more than one candidate
face regions in an image, the smallest sum of the dis-
tance metric (eq. 2) for the left and right eye and the
distance metric (eq. 3) for the detected mouth was
retained, in order for false alarms to be rejected.
For eye region detection, success or failure was
declared depending on whether the ground truth for
both eye centers was in the found eye regions. Mouth
region detection was considered successful if both
ground truth mouth corners were inside the region
found. For the eye center and mouth corner local-
ization, the correct detection rates were calculated
through the following criterion, introduced in (Je-
sorsky et al., 2001):
m
2
=
max(d
1
, d
2
)
s
< T (6)
In the previous formula, d
1
and d
2
are the dis-
tances between the eye centers or mouth corners
ground truth and the eye centers or mouth corners
found by the algorithm, and s is the distance between
the two ground truth eye centers or the distance be-
tween the mouth corners. A successful detection is
declared whenever m
2
is lower than threshold T.