2.2 Facial Feature Tracking
Once the facial feature points are extracted, their
trajectory is found with a comparison between the
real image and its prediction, based on the previous
history. This tracking process predicts the trajectory
based on the centroid of facial feature points and it is
implemented by the Kanade-Lucas-Tomasi (KLT)
Feature Tracker to adaptively adjust the weights of
the predictor filter. If we know the target location at
times , − 1, − 2, … then the location where the
target will be at time +1 can be predicted. The
KLT feature tracker is based on two papers: (Lucas
and Kanade, 1981); (Tomasi and Kanade, 1991).
In our case, the facial feature points are found by
AAM, and tracked by KLT Tracker. During the
tracking procedure, an affine transformation fits
between the image of the currently tracked feature
and its image from a non-consecutive previous
frame. If the affine compensated image is too dissi-
milar, then the feature is dropped. If a feature point
is lost in a subsequent frame, then the system auto-
matically requests the AAM procedure to create
another fitting step again to keep the number of
features constant. Facial feature points tracking
results better and more delicacy procedure as AAM
fitting over every subsequent frame because in this
way, small changes do not occur in any of facial
feature points, and they later will not appear as glob-
al results.
2.3 Head Pose Estimation
Computation of the position and orientation of an
object (object pose) using feature points when their
geometry on the object is known has the following
possible important applications, such as calibration,
cartography, tracking and object recognition. In this
section, we describe a method for estimating the
head pose from a single image. We assume that we
can detect and match in the image four or more non-
coplanar feature points (i.e. the landmarks of AAM
shape) on faces, in addition we also know their rela-
tive geometry.
The POSIT algorithm is used in our system to es-
timate the all three continuous rotation angles and
the translation vector of head pose. It is proposed by
DeMenthon (DeMenthon and Davis, 1995) and it is
one of the most effective feature-based algorithm.
POS Algorithm. The method combines two algo-
rithms; the first is the POS (Pose from Orthography
and Scaling). It approximates the perspective projec-
tion with a scaled orthographic projection and finds
the rotation matrix and translation vector of the head
by solving a linear system.
POSIT Algorithm. The second algorithm is the
POSIT (POS with ITerations). It uses the approx-
imate pose (result of POS) in its iteration loop, in
order to compute better the scaled orthographic
projections of the feature points, and then applies
POS again to these projections. The next iterations
apply exactly the same calculations, but with the
corrected image points. The process shifts the fea-
ture points of the object in the pose just found, to the
lines of sight (where they would belong if the pose
would be correct) and obtains a scaled orthographic
projection of these shifted points.
POSIT converges to accurate pose measurements
in a few iterations, POSIT can be used with many
feature points at once for added insensitivity to mea-
surement errors and image noise. Compared to clas-
sic approaches like the Newton's method, POSIT
does not require starting from an initial guess, and
computes the pose using fewer floating point opera-
tions. So therefore, it may be a useful alternative for
real-time operation. (Fig. 2. shows the result of
POSIT.)
Figure 2: Visualizing head pose with (two perpendicular
planes). The distance between a camera sensor and the
user is represented by the (red line) at the bottom of the
image. This one is not a measurement, because we use
only one camera. However, we can recognize if the user
bends forward, or sits back.
3 RESULTS AND CONCLUSIONS
Because the development is not finished yet, we
could not make a full performance test, but we used
a face database with 50 images for evaluation of the
pose estimation. We experienced that the head pose
seen on the pictures, corresponds to all intents and
purposes more than 90% with the head pose that
computes the application. The only limitation of the
GESTURE RECOGNITION - Control of a Computer with Natural Head Movements
529