2.1 Face Detection
The finding and tracking of a face is based on the
wide FOV stereo camera. The face tracking system
utilises the OpenCV implementation of the Viola and
Jones face detector (Viola and Jones, 2004) which
consists of groups of weak classifiers with high detec-
tions rates and low rejection rates. The weak classifier
has a correct detection-rate just above chance. Several
groups of weak classifiers are then combined forming
a cascade. At any stage, when a rejection is encoun-
tered, the process exits (candidate not in class). Only
when all stages in the cascade of weak classifiers have
responded positively, a face-detection is declared (Vi-
ola and Jones, 2004).
The algorithm is trained for frontal faces and both
left and right profile detections. In combination, the
system is capable of detecting the face of a panning
head. At this point only one face can be tracked at a
time and the system cannot handle head roll or faces
looking up or down.
2.2 The Tracking System
Whenever a face has been detected by the stereo cam-
era with the wide FOV, the centre point of the detected
face is geometrically reconstructed in 3D space. The
reconstructed centre point is then used as a refer-
ence point for rotating and repositioning the PTU and
hence the cameras. The positioning system utilises
a geometrical model of the physical tracking system,
including the two stereo cameras, and the reference
point to define the proper movements to reposition the
high resolution cameras directing them towards the
reconstructed centre point of the tracked face. Hence,
when a face is detected by the wide FOV cameras,
the high resolution cameras is directed towards the
tracked face. The PTU is controlled by constant speed
movement defined by the error E
di f f erence
between the
current orientation P
current
and the desired orientation
P
desired
of the high resolution cameras as indicated in
equation 1. The position of the cameras is updated
with a frequency of approximately 5 Hz. Both the
pan and tilt dimensions are included in the current po-
sition, desired position and error difference.
P
desired
= P
current
+ E
di f f erence
(1)
A geometric interpretation of the tracking is seen
in Fig. 3. Only the pan-dimension is shown but the
principle is the same for the tilt-direction.
Figure 3: Principle of the active vision system during face
tracking. A face is detected by the wide FOV (marked in
red) cameras shown in a) and the PTU rotates and reposition
the high resolution cameras (marked in blue) directing them
towards the centre of the tracked face as shown in b).
2.3 3D Facial Processing
Face recognition by means of matching a given face
to a database of faces, is a non-intrusive biometric
method that dates back several decades. In the last
years, there has been a renewed interest in develop-
ing new methods for automatic face recognition. This
renewed interest has been fuelled by advances in com-
puter vision techniques, computer design, sensor de-
sign, and face recognition systems. 3D face recogni-
tion algorithms identify faces from the 3D shape of
a person’s face. Face recognition systems not based
on 3D information are affected by changes in lighting
(illumination) and pose of the face which reduce per-
formance. Because the shape of faces is not affected
by changes in lighting or pose, 3D face recognition
has the potential to improve performance under these
conditions (Jafri and Arabnia, 2009).
In our system, we perform the following steps:
Firstly a 3D face model must be obtained. Two com-
mon approaches are stereo-imaging and the use of
structured light sensors, e.g. the Microsoft Kinect.
Once the 3D model is obtained, invariant measures
can be extracted. One approach described in the liter-
ature (Mata et al., 2007) computes geodesic distances
between sampled points on the facial surface. Based
on these distances, the points are then flattened into a
low-dimensional Euclidean space, providing a bend-
ing invariant (or isometric invariant) signature surface
that is robust to certain facial expressions. Finally, the
signature is compared with a database of signatures.
The high resolution cameras utilised for the 3D
reconstruction and recognition part are acquiring im-
ages in continuous mode. For each pair of images a
face detector – also the OpenCV implementation of
the Viola and Jones face detector (Viola and Jones,
2004) – is checking if a face is present in the image.
If a face is detected in both images of the stereo cam-
era, the position of the face is compared with the face
Enhanced3DFaceProcessingusinganActiveVisionSystem
469