FACE AND FACIAL FEATURE DETECTION EVALUATION

Performance Evaluation of Public Domain Haar Detectors for Face and Facial

Feature Detection

M. Castrill´on-Santana, O. D´eniz-Su´arez, L. Ant´on-Canal´ıs and J. Lorenzo-Navarro

Institute of Intelligent Systems and Numerical Applications in Engineering

Ediﬁcio Central del Parque Tecnol´ogico, Campus de Taﬁra, University of Las Palmas de Gran Canaria, Spain

Keywords:

Face and facial feature detection, haar wavelets, human computer interaction.

Abstract:

Fast and reliable face and facial feature detection are required abilities for any Human Computer Interaction

approach based on Computer Vision. Since the publication of the Viola-Jones object detection framework and

the more recent open source implementation, an increasing number of applications have appeared, particularly

in the context of facial processing. In this respect, the OpenCV community shares a collection of public domain

classiﬁers for this scenario. However, as far as we know these classiﬁers have never been evaluated and/or

compared. In this paper we analyze the individual performance of all those public classiﬁers getting the best

performance for each target. These results are valid to deﬁne a baseline for future approaches. Additionally

we propose a simple hierarchical combination of those classiﬁers to increase the facial feature detection rate

while reducing the face false detection rate.

1 INTRODUCTION

Fast and reliable face and facial element detection are

topics of great interest to get more natural and com-

fortable Human Computer Interaction (HCI) (Pent-

land, 2000). Therefore, the number of approaches

addressing this problem have increased in the last

years (Li et al., 2002; Schneiderman and Kanade,

2000; Viola and Jones, 2004) providing reliable ap-

proaches to the Computer Vision community. How-

ever, since the recent work by Viola and Jones (Viola

and Jones, 2004) describing a fast multi-stage gen-

eral object classiﬁcation approach, and the release of

an open source implementation (Lienhart and Maydt,

2002), this approach has been extensively used in

Computer Vision research, particularly for detecting

faces and their elements. Different authors have made

their classiﬁers (not their training sets) public. How-

ever, as far as the authors of this paper know, no per-

formance evaluation has been done yet.

In this paper, we compare different public domain

classiﬁers, based on Lienhart’s implementation, re-

lated to face and facial element detection, in order to

provide a baseline for future developments. Section 2

brieﬂy introduces the detection approach. Sections 3

and 4 present respectively the experimental setup and

the conclusions extracted.

2 HAAR-BASED DETECTORS

The general object detector framework described in

(Viola and Jones, 2004), is based on the idea of a

boosted cascade of weak classiﬁers. For each stage

in the cascade, a separate subclassiﬁer is trained to

detect almost all target objects while rejecting a cer-

tain fraction of the non-object patterns (which were

accepted by previous stages).

The resulting detection rate, D, and the false posi-

tive rate, F, of the cascade are given by the combina-

tion of each single stage classiﬁer rates:

D =

∏

i=1

F =

∏

i=1

(1)

Each stage classiﬁer is selected considering a com-

bination of features which are computed on the in-

tegral image. These features are reminiscent of

Haar wavelets and early features of the human visual

167

Castrillón-Santana M., Déniz-Suárez O., Antón-Canalís L. and Lorenzo-Navarro J. (2008).

FACE AND FACIAL FEATURE DETECTION EVALUATION - Performance Evaluation of Public Domain Haar Detectors for Face and Facial Feature

Detection.

In Proceedings of the Third International Conference on Computer Vision Theory and Applications, pages 167-172

DOI: 10.5220/0001073101670172

 SciTePress

pathway such as center-surround and directional re-

sponses. The implementation (Lienhart and Maydt,

2002) integrated in the OpenCV (Intel, 2006) extends

the original feature set (Viola and Jones, 2004).

With this approach, given a 20 stage detector de-

signed for refusing at each stage 50% of the non-

object patterns (target false positive rate) while falsely

eliminating only 0.1% of the object patterns (target

detection rate), its expected overall detection rate is

0.999

≈ 0.98 with a false positive rate of 0.5

≈

0.9∗ 10

−6

. This schema allows a high image process-

ing rate, due to the fact that background regions of

the image are quickly discarded, while spending more

time on promising object-like regions. Thus, the de-

tector designer chooses the desired number of stages,

the target false positive rate and the target detection

rate per stage, achieving a trade-off between accuracy

and speed for the resulting classiﬁer.

3 EXPERIMENTS

3.1 Experimental Setup

The available collection of public domain classiﬁers

that we have been able to locate are described in Ta-

ble 1. Their targets are frontal faces, proﬁle faces,

head and shoulders, eyes, noses and mouths. Refer-

ence information is included as well as the size of the

smallest detectable pattern, the label used belowin the

ﬁgures, and the processing time in seconds needed to

analyze the whole test set.

In order to analyze the performance of the differ-

ent classiﬁers, the CMU dataset (Schneiderman and

Kanade, 2000) has been chosen for the experimental

setup. This dataset contains a collection of hetero-

geneous images, feature that from our point of view

provides a better understanding of the classiﬁer per-

formance. The dataset is divided into four different

subsets test, newtest, test-low and rotated combining

the test sets of Sung and Poggio (Sung and Poggio,

1998) and Rowley, Baluja, and Kanade (Rowley et al.,

1998). The dataset and the annotation data can be ob-

tained at (Carnegie Mellon University, 1999).

3.2 Face Detection

In this Section the detection performance of those

classiﬁers designed to detect the whole face or head

is presented. We have considered the different frontal

face detection classiﬁers provided with the OpenCV

release (Lienhart et al., 2003a), the proﬁle face detec-

tor (Bradley, 2003), and the head and shoulders detec-

tor (Kruppa et al., 2003).

0 500 1000 1500 2000 2500 3000 3500 4000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

False Detections

Detection Rate

ROC

FA1

FAT

FA2

HS1

HS2

Figure 1: Face and head and shoulders detection perfor-

mance.

The criterion to determine if a face is correctly de-

tected requires that all its features are located inside

the detection, and the detection container width must

not be greater than four times the distance between

the annotated eyes.

For each classiﬁer its ROC curve was com-

puted applying the original release and some

variations obtained reducing its number of stages.

Observing the Area Under the Curve (AUC) of

the resulting ROC curves, see Figure 1, FA2

(haarcascade frontal face alt2 in the OpenCV

distribution (Intel, 2006)) offered the best per-

formance closely followed by FA1 and FD

(respectively haarcascade frontal face alt and

haarcascade frontal face def ault). It is, addition-

ally, the fastest among them. Remember that the test

dataset used is composed by four different subsets.

The original FA2 classiﬁer provided for the different

subsets the performance showed in the corresponding

column of Table 2. Note the lower performance

achieved for the rotated subset. These results were

expected as the Viola-Jones’ framework is not able

to accept more than slight variations in rotation with

respect to the positive samples used.

Table 2: Frontal face and proﬁle detection performance for

each subset.

Frontal face Proﬁle

Subset detection performance detection performance

newtest 89.07% 62.30%

test 86.98% 43.20%

rotated 19.28% 10.76%

test-low 83.56% 36.99%

The single proﬁle detector available reported

lower and less homogeneous performance, showing

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

168

Table 1: Public domain classiﬁers available.

Target Authors/Reference Download Size Stages Proc. time (secs.) Label

Frontal faces (Lienhart et al., 2003b; Lienhart et al., 2003a) (Intel, 2006) 24× 24 25 37.6 FD

Frontal faces (Lienhart et al., 2003b; Lienhart et al., 2003a) (Intel, 2006) 20× 20 21 41.8 FA1

Frontal faces (Lienhart et al., 2003b; Lienhart et al., 2003a) (Intel, 2006) 20× 20 46 31.8 FAT

Frontal faces (Lienhart et al., 2003b; Lienhart et al., 2003a) (Intel, 2006) 20× 20 20 35.7 FA2

Proﬁle faces (Bradley, 2003) (Reimondo, 2007) 20× 20 26 44.8 PR

Head and shoulders (Kruppa et al., 2003) (Intel, 2006) 22× 18 30 66.5 HS1

Head and shoulders (Kruppa et al., 2003) TBP 22× 20 19 98.4 HS2

Left eye (Castrill´on Santana et al., 2007) (Reimondo, 2007) 18× 12 16 44.9 LE

Right eye (Castrill´on Santana et al., 2007) (Reimondo, 2007) 18× 12 18 44.8 RE

Eye M. Wimmer (Wimmer, 2004) 25× 15 5 12.1 E1

Eye Urtho (Urtho, 2006) 10× 6 20 22 E2

Eye Ting Shan (Reimondo, 2007) 24× 12 104 20.9 E3

Eye pair (Castrill´on Santana et al., 2007) (Reimondo, 2007) 45× 11 19 14.6 EP1

Eye pair (Castrill´on Santana et al., 2007) (Reimondo, 2007) 22× 5 17 15.4 EP2

Eye pair (Bediz and Akar, 2005) (Reimondo, 2007) 35× 16 19 16.7 EP3

Nose (Castrill´on Santana et al., 2007) (Reimondo, 2007) 25× 15 17 27 N1

Mouth (Castrill´on Santana et al., 2007) (Reimondo, 2007) 25× 15 19 41.4 M1

Mouth (Liang et al., 2002) (Intel, 2006) 32× 18 18 15.6 M2

a better performance for the ﬁrst subset as suggests

the last column of Table 2 for the original classiﬁer.

Paying attention to the head and shoulders detec-

tor, HS1, in Figure 1, its results evidence a rather low

performance, as seen in the Figure. For that reason

we used a more recently trained classiﬁer (HS2) fol-

lowing the same approach but with a larger training

dataset. Its performance was notoriously better, cir-

cumstance that made us to suspect the presence of a

bug in the classiﬁer included in the current OpenCV

release. It must be noticed that this classiﬁer requires

the presence of the local context to react, circum-

stance that does not occur for different images con-

tained in the dataset. It must be also observed that

both classiﬁers provided a higher false detection rate

than those trained to detect frontal faces.

3.3 Facial Element Detection

A similar experimental setup was carried out for the

classiﬁers specialized in facial feature detection such

as eyes, nose and mouth. Figures 2 shows the results

achieved using the different eye detectors for the left

eye (the results obtained for the right eye are quite

similar, and therefore not included here).

The best performance is given by RE and LE, but

requiring more processing time. Both classiﬁers pro-

vide similar performance, providing always a lower

false detection rate (except for E2) and a greater de-

tection rate. For all the classiﬁers, we have not consid-

ered the detection of the opposite eye as a false detec-

tion. For LE and RE the classiﬁers performed worse

detecting the other eye, but they always got greater

detection rate than any other classiﬁer.

A facial feature is considered correctly detected,

if the distance to the annotated location is less than

1/4 of the actual distance between the annotated eyes.

This criterion was used to estimate the eye detection

success originally in (Jesorsky et al., 2001).

0 2000 4000 6000 8000 10000 12000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

False Detections

Detection Rate

ROC

Figure 2: Left eye detection performance.

The performance achieved by LE for the different

subsets is showed in Table 3. It is observed that this

feature is also detected, with a similar rate, when faces

are rotated, showing even a higher detection rate than

for frontal faces. However, facial feature detectors

perform poorly with low resolution (test-low set), i.e.

they require larger faces to be reliable.

We have also analyzed the eye pair, nose and

mouth detectors, see Figure 3. The best eye pair

detection performance is given by EP3, with similar

processing cost. The nose detector has the lowest per-

formance among the whole set of facial feature de-

FACE AND FACIAL FEATURE DETECTION EVALUATION - Performance Evaluation of Public Domain Haar

Detectors for Face and Facial Feature Detection

169

Table 3: Left eye detection for each subset.

Subset Detection performance

newtest 39.89%

test 29.69%

rotated 32.29%

test-low 10.96%

tectors analyzed. However the best mouth detector,

behaves notoriously better.

As a summary, the best facial element detectors

available in the public domain are: RE, LE, EP3, N

(only one available) and M1.

0 500 1000 1500 2000 2500 3000 3500 4000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

False Detections

Detection Rate

ROC

EP1

EP2

EP3

Figure 3: Eye pair (EP1-3, Nose (N. and Mouth (M1 and

M2) detection performance.

Even when searching facial features in the whole im-

age reported much lower performance than for face

detectors, it is evidenced that in some circumstances

face detection could be tackled by detecting some of

its inner features. An example of this is shown in

Figure 4. The best face detector used was not able

to locate the evident face, likely because it is par-

tially cropped. However some of their inner features

were positively detected. Certainly, some false de-

tections appear, but geometric consistency is a simple

approach to ﬁlter them (Wilson and Fernandez, 2006).

The main drawback is that the processing time will be

larger.

3.4 Combined Detection

As shown in previous sections, facial element detec-

tion seems to behave worse than facial detection fol-

lowing the same searching approach. A reasonable

reason for this is the lower resolution of facial el-

ements in relation to the face, therefore the differ-

ent classiﬁers are trying to locate patterns which are

smaller than those used for training. In this section we

Figure 4: Sample image of the CMU dataset (?) that re-

ported no face detection. However, facial feature detection

provided a set of detections (red and blue for left and right

eye respectively, magenta for nose and green for mouth).

Even when some false detection are present, they could be

in many situations ﬁltered by means of the coherent geomet-

ric location of the different features (Wilson and Fernandez,

2006).

will apply facial feature detection only in those areas

were a face has been detected, in order to analyze if

their performance can be improved, and additionally,

to observe if the no detection of facial elements can be

used as a ﬁlter to remove some false face detections.

To detect faces we used the FA2 detector with 18

stages, see Table 1, which in Section 3.2 evidenced

the best relation between processing speed and detec-

tion ratio. Once a face is given, the individual facial

elements were searched using three different proce-

dures: 1) Searching the facial elements in those areas

or regions of interest (ROIs), where a face has been

detected (C1), 2) Scaling up the ROI image twice (in

order to have bigger facial details) before searching

(C2), and 3) Scaling up the ROI twice and bound-

ing the acceptable detections size according to the de-

tected face size (C3). We would like to remark that

for every approach each feature is searched always

in a ROI coherent with the expected feature location

for a frontal face, similarly to (Wilson and Fernandez,

2006).

Figure 5 compares the results achieved for the LE

classiﬁer using the different approaches: C1, C2 and

C3. With C1, we just restricted the search area to

the face detected, the facial feature detection rate is

reduced in relation to the results presented in Section

3.3.

Scaling up the ROIs, C2 and C3, increases signif-

icantly the facial feature detection rate, but also the

false detection rate, as suggested by Figure 5. How-

ever, the latter keeps being lower than in absence of

ROIs, see Figure 2. In addition, if we relate to the

face size the acceptable size of the facial features to

search, C3, the false detection rate is clearly reduced

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

170

keeping a similar detection rate. Due to the reduced

space available, the results for the other facial features

are summarized in Table 4.

0 100 200 300 400 500 600 700 800 900 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

False Detections

Detection Rate

ROC

LE−C1

LE−C2

LE−C3

Figure 5: LE. performance for the different approaches.

Table 4: Facial feature detection performance with different

approaches (TD: correct detection rate, FD: number of false

detections).

App. Left eye Right eye Nose Mouth

TD FD TD FD TD FD TD FD

C1 17.2% 18 15.1% 18 1.7% 0 17.1% 52

C1 49.1% 154 45.5% 137 22.2% 120 41.5% 196

C3 48.7% 128 45.1% 124 21.9% 104 41.5% 184

We have also applied a soft heuristic to reduce the

number of false face detections, making use of the

results achieved for the facial features. Big enough

face detections, larger than 40 pixels in width, that

report no facial feature detection are not accepted as

face detections. Figure 6 presents the results achieved

for FA2 and different combinations of the facial fea-

tures detectors. The original performance is depicted

with FA, the modiﬁed with FAv. The detection rate is

almost not affected, while achieving an improvement

in false detection reduction. To give an example, for

the most restrictive FA2 classiﬁer, the initial number

of false detections was 47, but applying this approach

was reduced to 35.

Are there false detections with more than one facial

element detected? The answer is positive for four im-

ages. An example of is presented in Figure 7, only the

human faces are annotated, but Peggy is there. These

detections were labeled as incorrect only because the

container is a little bit bigger than it should. Indeed

the facial features could be used to better ﬁt the con-

tainer.

0 50 100 150 200 250 300 350 400 450 500

0.5

0.55

0.6

0.65

0.7

0.75

0.8

False Detections

Detection Rate

ROC

FAv

Figure 6: Face detection results achieved with approaches

FA. and FAv. Note the reduction in false detection rate

achieved by the second.

Figure 7: Example of false face detection that reported at

least two inner facial features corresponding to rot5. im-

age (contained in CMU dataset (Schneiderman and Kanade,

2000)) from the rotated subset. Does the reader consider it

a false detection? Other detections are present with soccer

and Germany images, but not included here due to space

restrictions.

4 CONCLUSIONS

In this paper we have reviewed the possibilities of

current public domain classiﬁers, based on the Viola-

Jones’ general object detection framework (Viola and

Jones, 2004), for face and facial feature detection.

We have observed a similar performance achieved by

three of those frontal face detectors included in the

FACE AND FACIAL FEATURE DETECTION EVALUATION - Performance Evaluation of Public Domain Haar

Detectors for Face and Facial Feature Detection

171

OpenCV release. We have also compared some fa-

cial feature detectors available thanks to the OpenCV

community, observing that they perform worse than

those designed for frontal face detection. However,

this aspect can be justiﬁed by the lower resolution of

the patterns being saerched.

Finally we observed the possibilities provided by

a simple combination of those classiﬁers, applying

coherently facial feature detection only in those ar-

eas where a face has been detected. In this sense,

the facial feature classiﬁers can be applied with more

detail without remarkably increasing the processing

cost. This approach performed better, improving fa-

cial feature detection and reducing the number of false

positives. These results have been achieved with-

out any further restriction in terms of facial feature

coocurrence, relative distances and so on. Therefore,

we consider that further work can be done to get a

more robust cascade approach using the public do-

main available classiﬁers, providing reliable informa-

tion.

We consider also that the combination of face and

facial feature detection can improve face detection,

adding reliability to the traditional face detection ap-

proaches. However, the solution requires more com-

putational power due to the fact that more processing

is performed.

ACKNOWLEDGEMENTS

Work partially funded the Spanish Ministry of Ed-

ucation and Science and FEDER funds (TIN2004-

07087).

REFERENCES

Bediz, Y. and Akar, G. B. (2005). View point tracking for 3d

display systems. In 3th European Signal Processing

Conference, EUSIPCO-2005.

Bradley, D. (2003). Proﬁle face detection.

http://www.davidbradley.info/publications/bradley-

iurac-03.swf. Last accesed 5/11/2007.

Carnegie Mellon University (1999). CMU/VACS

image database: Frontal face images.

http://vasc.ri.cmu.edu/idb/html/face/frontal images/

index.html. Last accesed 5/11/2007.

Castrill´on Santana, M., D´eniz Su´arez, O.,

Hern´andez Tejera, M., and Guerra Artal, C. (2007).

ENCARA2: Real-time detection of multiple faces

at different resolutions in video streams. Journal of

Visual Communication and Image Representation,

pages 130–140.

Intel (2006). Intel Open Source Computer Vision Library,

v1.0. http://sourceforge.net/projects/opencvlibrary/.

Jesorsky, O., Kirchberg, K. J., and Frischholz, R. W. (2001).

Robust face detection using the hausdorff distance.

Lecture Notes in Computer Science. Procs. of the

Third International Conference on Audio- and Video-

Based Person Authentication, 2091:90–95.

Kruppa, H., Castrill´on Santana, M., and Schiele, B. (2003).

Fast and robust face ﬁnding via local context. In Joint

IEEE Internacional Workshop on Visual Surveillance

and Performance Evaluation of Tracking and Surveil-

lance (VS-PETS), pages 157–164.

Li, S. Z., Zhu, L., Zhang, Z., Blake, A., Zhang, H., and

Shum, H. (2002). Statistical learning of multi-view

face detection. In European Conference Computer Vi-

sion, pages 67–81.

Liang, L., Liu, X., Pi, X., Zhao, Y., and Neﬁan, A. V.

(2002). Speaker independent audio-visual continuous

speech recognition. In International Conference on

Multimedia and Expo, pages 25–28.

Lienhart, R., Kuranov, A., and Pisarevsky, V. (2003a). Em-

pirical analysis of detection cascades of boosted clas-

siﬁers for rapid object detection. In DAGM’03, pages

297–304, Magdeburg, Germany.

Lienhart, R., Liang, L., and Kuranov, A. (2003b). A de-

tector tree of boosted classiﬁers for real-time object

detection and tracking. In IEEE ICME2003, pages

277–80.

Lienhart, R. and Maydt, J. (2002). An extended set of haar-

like features for rapid object detection. In IEEE ICIP

2002, volume 1, pages 900–903.

Pentland, A. (2000). Looking at people: Sensing for ubiqui-

tous and wearable computing. IEEE Trans. on Pattern

Analysis and Machine Intelligence, pages 107–119.

Reimondo, A. (2007). Haar cascades repository.

http://alereimondo.no-ip.org/OpenCV/34.

Rowley, H. A., Baluja, S., and Kanade, T. (1998). Neural

network-based face detection. IEEE Trans. on Pattern

Analysis and Machine Intelligence, 20(1):23–38.

Schneiderman, H. and Kanade, T. (2000). A statistical

method for 3d object detection applied to faces and

cars. In IEEE Conference on Computer Vision and

Pattern Recognition, pages 1746–1759.

Sung, K.-K. and Poggio, T. (1998). Example-based learning

for view-based human face detection. IEEE Trans. on

Pattern Analysis and Machine Intelligence, 20(1):39–

51.

Urtho (2006). Eye detector. http://face.urtho.net/. Last ac-

cesed 5/9/2007.

Viola, P. and Jones, M. J. (2004). Robust real-time face

detection. International Journal of Computer Vision,

57(2):151–173.

Wilson, P. I. and Fernandez, J. (2006). Facial feature de-

tection using haar classiﬁers. Journal of Computing

Sciences in Colleges, 21:127–133.

Wimmer, M. (2004). Eyeﬁnder. http://www9.cs.tum.edu/

people/wimmerm/se/project.eyeﬁnder/. Last accesed

5/11/2007.

VISAPP 2008 - International Conference on Computer Vision Theory and Applications

172