2 RELATED WORK
As previously commented, the breakthrough in face
detection was caused by the work of Viola and Jones
in 2001 (Viola and Jones, 2001), who used Haar-
like features combined with a boosting-based learn-
ing scheme. Since 2001, there have been many works
aimed to enhance the Viola and Jones’ detector. For
instance, different feature types have been considered,
such as generic linear features, composite features,
variations from Haar features, etc. In addition to ex-
ploring better features, improvements to the boost-
ing algorithm have also been proposed, such as us-
ing different boosting schemes, reuse of intermediate
results, etc
1
.
Today, most of the state-of-the art reported sys-
tems, such as DCascade (Xiao et al., 2007) or SCas-
cade (Bourdev and Brandt, 2005) are still based on
the basic paradigm of Viola and Jones.
The Viola and Jones’ algorithm is a clear exam-
ple of a content-based face detector. Such detectors
locate faces by classifying the content of a detection
window, iterating over all positions and scales of the
input image. As an alternative to content-based de-
tectors, context-based object detectors rely on the in-
formation about the environment-object relationship
to infer the face position and scale in the image. The
advantage of this approach is that it is more permis-
sive about the object appearance, since it is detected
by exploiting environment-object relationships.
Lately, some interesting algorithms for human de-
tection have been proposed. In (Felzenszwalb et al.,
2010), a latent SVM is used to define a Discrimina-
tively trained Part-based Model (DPM) that can be
applied to person detection. Another interesting ap-
proach for person detection is the Poselet-Based De-
tector (PBD) presented in (Bourdev and Malik, 2009)
where information about specific body parts is used
in a probabilistic Hough framework to detect human
bodies in unconstrained images. All these methods
can be appropriately adapted to perform as context-
based face detectors.
Recently, some authors have used the visual in-
formation surrounding an hypothesis in order to im-
prove face detectors, usually as a post-processing pro-
cedure. In (Atanasoaei et al., 2010), Atanasoaei et
al. present a hierarchical model which is built using
the detection distribution around a target hypothesis to
discriminate between false alarms and true detections.
A different approach was proposed in (H. Takatsuka
and Okutomi, 2007) to study the score distribution in
both location and scale space. All these methods use
1
See (Zhang and Zhang, 2010) for an extensive report
on all these variations.
the word context as a proxy for the output of a single
face classifier around the hypothesis. In (Kruppa and
Schiele, 2003), a method is presented that focuses on
the role of local context as a predictive cue for compu-
tational face detection. Local context is implemented
by an object detector which is trained with instances
that contain the entire head, neck and part of the up-
per body of a person. In experiments on two large
data sets, they find that using local context can signif-
icantly increase the number of correct detections, par-
ticularly in low resolution cases. However, the ques-
tion of optimal integration of different detectors is not
considered and no results are presented with respect
to such an integration of the local context detector as
an auxiliary cue for a content-based detector.
The application of both, content- and context-
based face detectors suffer from the same problem:
they are dealing with a non-symmetric problem, and
to get a low false detection rate (which is a require-
ment), they lost a non-negligible part of true detec-
tions. The idea behind our system is that by integrat-
ing several content- and context-based face detectors
tuned to work with a higher true detection rate than
when used independently (see Fig. 1), we can keep
the false detection rate at normal levels.
Figure 1: Content- and context-based detectors working at
high true detection rates produce a lot of false positives
which can be managed by integrating all detectors.
3 FACE DETECTOR SYSTEM
The proposed face detection system is divided in two
main blocks. In the first stage, all possible face hy-
potheses are generated by the individual face detec-
tors and a score which represents the detection confi-
dence is assigned to each one.
In the second stage, the information from all hy-
potheses (hypotheses locations, scales and scores) is
used in order to assign an integrated confidence score
AN INTEGRATED APPROACH TO CONTEXTUAL FACE DETECTION
91