DETECTION OF FACIAL CHARACTERISTICS BASED ON EDGE

INFORMATION

Stylianos Asteriadis, Nikolaos Nikolaidis, Ioannis Pitas

Department of Informatics, Aristotle University of Thessaloniki, Box 451 Thessaloniki, GR-54124, Greece

Montse Pardas

Dept. Teoria del Senyal i Comunicacions, Universitat Polit´ecnica de Catalunya, Barcelona, Spain

Keywords:

Facial features detection, distance map, ellipse ﬁtting.

Abstract:

In this paper, a novel method for eye and mouth detection and eye center and mouth corner localization, based

on geometrical information is presented. First, a face detector is applied to detect the facial region, and the

edge map of this region is extracted. A vector pointing to the closest edge pixel is then assigned to every

pixel. x and y components of these vectors are used to detect the eyes and mouth. For eye center localization,

intensity information is used, after removing unwanted effects, such as light reﬂections. For the detection

of the mouth corners, the hue channel of the lip area is used. The proposed method can work efﬁciently on

low-resolution images and has been tested on the XM2VTS database with very good results.

1 INTRODUCTION

In recent bibliography, numerous papers have been

published in the area of facial feature localization,

since this task is essential for a number of important

applications like face recognition, human-computer

interaction, facial expression recognition, surveil-

lance, etc.

In (Cristinacce et al., 2004) a multi-stage approach

is used to locate features on a face. First, the face is

detected using the boosted cascaded classiﬁer algo-

rithm by Viola and Jones (Viola and Jones, 2001).

The same classiﬁer is trained using facial feature

patches to detect facial features. A novel shape con-

straint, the Pairwise Reinforcement of Feature Re-

sponses (PRFR) is used to improve the localization

accuracy of the detected features. In (Jesorsky et al.,

2001) a three stage technique is used for eye center lo-

calization. The Hausdorff distance between edges of

the image and an edge model of the face is used to de-

tect the face area. At the second stage, the Hausdorff

distance between the image edges and a more reﬁned

model of the area around the eyes is used for more

accurate localization of the upper area of the head.

Finally, a Multi-Layer Perceptron (MLP) is used for

ﬁnding the exact pupil locations. In (Zhou and Geng,

2004) the authors use Generalized Projection Func-

tions (GPF) to locate the eye centers in an eye area

found using the algorithm proposed in (Wu and Zhou,

2003). The type of functions used here are a linear

combination of functions which consider the mean of

intensities and functions which consider the intensity

variance along rows and columns.

A technique for eyes and mouth detection and

eyes center and mouth corners localization is pro-

posed in this paper. After accurate face detection us-

ing an ellipse ﬁtting algorithm (Salerno et al., 2004),

the detected face is normalized to certain dimensions

and a vector ﬁeld is created by assigning to each

pixel a vector pointing to the closest edge. The eyes

and mouth regions are detected by ﬁnding regions in-

side the face, whose vector ﬁelds resemble the vector

ﬁelds of eye and mouth templates extracted from sam-

ple eye and lip images. Intensity and color informa-

tion is then used within the detected eye and mouth

regions in order to accurately localize the eye cen-

ters and the mouth corners. Our technique has been

tested on the XM2VTS database with very promising

results. Comparisons with other state of the art meth-

ods verify that out method achieves superior perfor-

mance.

The structure of the paper is as follows. Section

2 describes the steps followed to detect a face in an

image. In section 3 the method used to locate eye

247

Asteriadis S., Nikolaidis N., Pitas I. and Pardas M. (2007).

DETECTION OF FACIAL CHARACTERISTICS BASED ON EDGE INFORMATION.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 247-252

 SciTePress

and mouth areas on a face image is described. Sec-

tion 4 details the steps used to localize the eye centers

and the mouth corners. Section 5 describes the ex-

perimental evaluation procedure (database, distance

metrics, etc). In section 6 results on eye and mouth

detection are presented, and the proposed method is

compared to other approaches in the literature. Con-

clusions follow.

2 FACE DETECTION

Prior to eye and mouth region detection, face detec-

tion is applied on the face images. The face is de-

tected using the Boosted Cascade method, described

in (Viola and Jones, 2001). The output of this method

is usually the face region with some background. Fur-

thermore, the position of the face is often not centered

in the detected sub-image, as it is shown in the ﬁrst

row of Figure 1. Since the detection of the eyes and

mouth will be done on detected face regions of a pre-

deﬁned size, it is very important to have a very ac-

curate face detection. Consequently, a technique to

postprocess the results of the face detector is used.

More speciﬁcally, a technique that compares the

shape of a face with that of an ellipse is used. This

technique is based on the work reported in (Salerno

et al., 2004). According to this technique, the distance

map of the face area found at the ﬁrst step is extracted.

Here, the distance map is calculated from the binary

edge map of the area. An ellipsis scans the distance

map and a score that is the average of all distance map

values on the ellipse contour e, is evaluated.

score =

|e|

∑

(x,y)∈e

D(x, y) (1)

where D is the distance map of the region found by the

Boosted Cascade algorithm and |e| denotes the num-

ber of the pixels covered by the ellipse contour.

This score is calculated for various scale and

shape transformations of the ellipse. The transforma-

tion which gives the best score is considered as the

one that corresponds to the ellipse that best describes

the exact face contour.

The right and left boundaries of the ellipse are

considered as the new lateral boundaries of the face

region. Examples of the ellipse ﬁtting procedure are

shown in Figure 1.

Figure 1: Face detection reﬁnement procedure.

3 EYE AND MOUTH REGION

DETECTION

3.1 Eye Region Detection

The proposed eye region detection method can be out-

lined as follows: The face area found is scaled to

150x105 pixels and the Canny edge detector is ap-

plied. Then, for each pixel, the vector that points to

the closest edge pixel is calculated. The x and y com-

ponents of each vector are assigned to the correspond-

ing pixel. Thus, instead of the intensity values of each

pixel, we generate and use in the proposed algorithm

a vector ﬁeld whose dimensions are equal to those of

the image, and where each pixel is characterized by

the vector described above. This vector encodes for

each pixel, information regarding its geometric rela-

tion with neighboring edges, and thus is relatively in-

sensitive to intensity variations or poor lighting condi-

tions. The vector ﬁeld can be represented as two maps

(images) representing the horizontal and vertical vec-

tor components for each pixel. Figure 2(a) depicts the

detected face in an image, Figures 2(b) 2(c), show the

horizontal and vertical component maps of the vector

ﬁeld of the detected face area, respectively.

(a) (b) (c)

Figure 2: (a) Detected face (b) horizontal coordinates of

vector ﬁeld (c) vertical coordinates of vector ﬁeld.

(a) (b)

Figure 3: (a) Mean vertical component map of right eye (b)

mean horizontal component map of right eye.

In order to detect the eye areas, regions R

size NxM within the detected face are examined and

the corresponding vector ﬁelds are compared with the

mean vector ﬁelds extracted from a set of right or left

eye images (Figure 3).

The similarity between an image region and the

templates is evaluated by using the following distance

measure:

∑

i∈R

− m

k (2)

where k · k denotes the L

norm. Essentially for a

NxM region R

the previous formula is the sum of

the euclidean distances between vectors v

of the can-

didate region and the corresponding m

of the mean

vector ﬁeld of the eye we are searching for (right or

left). The candidate region on the face that minimizes

is marked as the region of the left or right eye.

3.2 Mouth Region Detection

(a) (b) (c)

Figure 4: (a) Sample mouth region image, (b) mean vertical

component map of the mouth region, (c) mean horizontal

component map.

The mouth region was detected using a procedure

similar to the one used for eye detection. The vec-

tor ﬁeld of various candidate regions was compared

to a mean mouth vector ﬁeld. For the extraction of

this mean vector map, mouth images, scaled to the

same dimensions N

, were used. An example of

a mouth image, used for the calculation of the mean

vector ﬁeld and the mean horizontal and vertical com-

ponent maps, can be seen in Figure 4(a). However,

since lip and skin color are, in many cases, similar

and since beard (when existent) might occlude or dis-

tort the lips shape, lips localization is more difﬁcult.

For this reason, an additional factor is included in eq.

2. This factor is the inverse of the number of edge

pixels of the horizontal edge map evaluated within the

candidate mouth area. This term was added because,

due to the elongated shape of the lips, the correspond-

ing area is characterized by a large concentration of

horizontal edges. Thus, this factor helps at discrim-

inating between mouth/non-mouth regions. The ad-

ditional factor is weighted so that its mean value in

the search zone is equal to the mean value of the E

distance of the candidate mouth areas from the mean

vector coordinate maps. Based on the above, the dis-

tance measure used in mouth region detection is the

following:

mouth

∑

i∈R

− m

k +

∑

i∈R

horizontalEdges

, (3)

where I

horizontalEdges

is the horizontal binary edge

value for pixel i of candidate region R

. More specif-

ically, I

horizontalEdges

is one if pixel i is an edge pixel

and zero otherwise.

4 LOCALIZATION OF

CHARACTERISTIC POINTS

After eye and mouth areas detection, the eye centers

and mouth corners are localized within the found ar-

eas using the procedures described in the following

sections.

4.1 Eye Center Localization

The eye area found using the procedure described in

section 3 is scaled back to the dimensions N

eye

had in the initial image. Moreover, before eye center

localization, a pre-processing step is applied. Since

reﬂections (highlights), that affect the results in a neg-

ative way, frequently appear on the eye, a reﬂection

removal step is implemented. This proceeds as fol-

lows: The eye area is ﬁrst converted into a binary im-

age through thresholdingusing the threshold selection

method proposed in (Otsu, 1979). Subsequently, all

the small white connected components of the result-

ing binary eye image are considered as highlight areas

and the intensities of the pixels in the grayscale image

that correspond to these areas are substituted by the

average luminance of their surrounding pixels. The

result is an eye area with most highlights removed.

The eye center localization is performed in three

steps, each step reﬁning the results obtained in the

previous one. By inspecting the eye images used for

the extraction of the mean vector maps, one can ob-

serve that the eyes reside at the lower central part of

the detected eye area. Thus, the eye center is searched

within an area that covers the lower 60% of the eye

region and excludes the right and left parts of this re-

gion. The information in this area comes from the eye

itself and not from the eyebrow or the eyeglasses.

Since, at the actual eye center position, there is

signiﬁcant luminance variation along the horizontal

and vertical axes, the images D

(x, y) and D

(x, y) of

the absolute discrete intensity derivatives along the

horizontal and vertical directions are evaluated:

(x, y) = |I(x,y) − I(x− 1,y)| (4)

(x, y) = |I(x,y) − I(x, y− 1)| (5)

The contents of the horizontalderivativeimage are

subsequently projected on the vertical axis and the

contents of the vertical derivative image are projected

on the horizontal axis. The 4 vertical and 4 horizontal

lines, corresponding to the 4 largest vertical and hori-

zontal projections(i.e., the lines crossing the strongest

edges) are selected. The point whose x and y coordi-

nates are the medians of the coordinates of the vertical

and horizontal lines respectively, deﬁnes an initial es-

timate of the eye center (Figure 5(a)).

Using the fact that the eye center is in the mid-

dle of the largest dark area in the region, the previous

result can be further reﬁned: The darkest column (de-

ﬁned as the column with the lowest sum of pixel in-

tensities) of a 0.4N

eye

pixels high and 0.15M

eye

pixels

wide area around the initial estimate is found and its

position is used to deﬁne the horizontal coordinate of

the reﬁned eye center. In a similar way, the darkest

row in a 0.15N

eye

x0.4M

eye

area around the initial esti-

mate is used to locate the vertical position of the eye

center (Figure 5(b)).

For even more reﬁned results, in a 0.4N

eye

area around the point found at the previous step, the

darkest 0.25N

eye

x0.25M

eye

region is searched for, and

the eye center is considered to be located in the middle

of this region. This point gives the ﬁnal estimate of

the eye center, as can be seen in ﬁgure 5(c).

(a) (b) (c)

Figure 5: (a)Initial estimate of eye center (b) estimate after

ﬁrst reﬁnement, (c) ﬁnal eye center localization.

4.2 Mouth Corner Localization

For mouth corner localization, the hue component of

mouth regions can be exploited, since the hue values

of the lips are distinct from those of the surrounding

area. More speciﬁcally, the lip color is reddish and,

thus, its hue values are concentrated around 0

. In or-

der to detect the mouth corners, the pixels of the hue

component are classiﬁed into two classes through bi-

narization (Otsu, 1979). The class whose mean value

is closer to 0

is declared as the lip class. Small com-

ponents assigned to the lip class (while they are not

lip parts) are discarded using a procedure similar to

the light reﬂection removal procedure.

Afterwards, the actual mouth corner localization

is performed by scanning the binary image and look-

ing for the rightmost and leftmost pixels belonging to

the lip class.

5 EXPERIMENTAL EVALUATION

PROCEDURE

The proposed method has been tested on the

XM2VTS database (Messer et al., 1999), which has

been used in many facial feature detection papers.

This database contains 1180 face and shoulders im-

ages. All images were taken under controlled lighting

conditions and the background is uniform. The data-

base contains ground truth data for eye centers and

mouth corners.

Out of a total of 1180 images, only 3 faces failed

to be detected. In cases of more than one candidate

face regions in an image, the smallest sum of the dis-

tance metric (eq. 2) for the left and right eye and the

distance metric (eq. 3) for the detected mouth was

retained, in order for false alarms to be rejected.

For eye region detection, success or failure was

declared depending on whether the ground truth for

both eye centers was in the found eye regions. Mouth

region detection was considered successful if both

ground truth mouth corners were inside the region

found. For the eye center and mouth corner local-

ization, the correct detection rates were calculated

through the following criterion, introduced in (Je-

sorsky et al., 2001):

max(d

, d

)

< T (6)

In the previous formula, d

and d

are the dis-

tances between the eye centers or mouth corners

ground truth and the eye centers or mouth corners

found by the algorithm, and s is the distance between

the two ground truth eye centers or the distance be-

tween the mouth corners. A successful detection is

declared whenever m

is lower than threshold T.

6 EXPERIMENTAL RESULTS

Two types of results were obtained on the images de-

scribed above: results regardingeye/lips region detec-

tion and results on eye center/mouth corner localiza-

tion. All the results take into account the results of the

face detection step and are described in the following

sections.

6.1 Eye Detection and Eye Center

Localization

Correct eye region detection percentages are listed in

the column of Table 1 denoted as ”Eye regions”. It is

obvious that the detection rates are very good both for

people not wearing eyeglasses and those who do.

The column labelled ”Eye Centers” in the same

Table present correct eye center localization results

for threshold value T=0.25.

Furthermore, the success rates for various values

of the threshold T, for the whole database are depicted

in Figure 6. From the ﬁgure it can be observed that,

even for very small thresholds T (i.e. for very strict

criteria), success rates remain very high. For exam-

ple, the maximum distance of the detected eye centers

from the real ones does not exceed 5% (T=0.05) of

the inter-ocular distance in 93.5% of the cases, which

means that the algorithm can detect eye centers very

accurately.

0 0.1 0.2 0.3 0.4 0.5

100

Threshold T

% success

Figure 6: Eye center localization for various thresholds T.

6.2 Mouth Detection and Mouth Corner

Localization

The mouth was correctly detected in 98.05% of the

cases. The mouth corner localization success rates for

T=0.25 is 97.6%. Figure 7 shows the success rates of

mouth corner localization for various T. It is obvious

that the method has very good performance in detect-

ing the mouth and localizing its corners.

0 0.1 0.2 0.3 0.4 0.5

100

Threshold T

% success

Figure 7: Mouth corner localization for various thresholds

T for the entire database.

6.3 Comparison with Other Methods

The method has been compared with other existing

methods, that were tested by the corresponding au-

thors on the same database for the eye center localiza-

tion task. Unfortunately, no mouth corner detection

method tested on the XM2VTS database was found.

For T=0.25 our method achieves an overall detection

rate of 99.3%, while Jesorsky et al in (Jesorsky et al.,

2001) achieve 98.4%. The superiority of the proposed

method is much more prominent for stricter criteria,

i.e. for smaller values of the threshold T: For T=0.1,

both (Jesorsky et al., 2001) and (Cristinacce et al.,

2004) achieve a success rate of 93%, while the pro-

posed method localizes the eye centers successfully

in 98.4% of the cases. Some results of the proposed

method can be seen in Figure 8.

7 CONCLUSIONS

A novel method for facial feature detection and lo-

calization was proposed in this paper. The method

utilizes the vector ﬁeld that is formed by assigning to

each pixel a vector pointing to the closest edge, en-

coding, in this way, the geometry of such regions,

in order to detect eye and mouth areas. Luminance

and chromatic information were exploited for accu-

rate localization of characteristic points, namely the

eye centers and mouth corners. The method proved

to give very accurate results, failing only at extreme

cases.

ACKNOWLEDGEMENTS

This work has been partially supported by the FP6

European Union Network of Excellence MUSCLE

Table 1: Results on the XM2VTS database.

Eye Regions Eye Centers for T=0.25

People without glasses 99.2% 99.6%

People with glasses 98.2% 98.7%

Total 98.85% 99.3%

(a) (b) (c)

(d) (e) (f)

Figure 8: Some successfully (a)-(d) and some erroneously

(e),(f) detected facial features.

”Multimedia Understanding Through Semantic Com-

putation and Learning” (FP6-507752).

REFERENCES

Cristinacce, D., Cootes, T., and Scott, I. (2004). A multi-

stage approach to facial feature detection. In 15th

British Machine Vision Conference, 231-240.

Jesorsky, O., Kirchberg, K. J., and Frischholz, R. W. (2001).

Robust face detection using the hausdorff distance.

In 3rd International Conference on Audio and Video-

based Biometric Person Authentication, 90-95.

Messer, K., Matas, J., Kittler, J., Luettin, J., and Maitre,

G. (1999). Xm2vtsdb: The extended m2vts database.

In 2nd International Conference on Audio and Video-

based Biometric Person Authentication, 72-77.

Otsu, N. (1979). A threshold selection method from gray-

level histograms. In IEEE Transactions on Systems,

Man, and Cybernetics, 9, No 1, 62-66.

Salerno, O., Pardas, M., Vilaplana, V., and Marques, F.

(2004). Object recognition based on binary partition

trees. In IEEE Int. Conference on Image Processing,

929-932.

Viola, P. and Jones, M. (2001). Rapid object detection using

a boosted cascade of simple features. In IEEE Com-

puter Vision and Pattern Recognition, 1, 511-518.

Wu, J. and Zhou, Z. (2003). Efﬁcient face candidates selec-

tor for face detection. In Pattern Recognition, 36, No

5, 1175-1186.

Zhou, Z. and Geng, X. (2004). Projection functions for eye

detection. In Pattern Recognition, 37, No 5, 1049-

1056.