video sequence no. 1
0 100 200 300 400 500 600 700
0
10
20
frame
pixel displacement
our algorithm
KLT tracker
video sequence no. 2
0 50 100 150 200 250 300 350
5
10
15
frame
pixel displacement
our algorithm
KLT tracker
video sequence no. 3
0 50 100 150 200 250 300 350 400
0
20
40
frame
pixel displacement
our algorithm
KLT tracker
Figure 3: Pixel displacement for three test video sequences.
the N = 22 facial feature points:
D =
1
N
N
∑
i=1
d
i
. (21)
Figure 3 shows the results for the three video se-
quences. As a baseline system the KLT feature tracker
was chosen. Observe that already for the first frame
there is a small pixel displacement indicating that a
portion of the displacement is owed to the fact that
the annotation cannot always be performed unequiv-
ocally. For higher frame numbers the KLT feature
tracker loses track of some points and thus the pixel
displacement accumulates. Our algorithm is able
to prevent this effect and shows robust performance
over the whole sequence. In Table 1, the pixel dis-
placement averaged over all labeled frames of a se-
quence is depicted. Our algorithm outperforms the
KLT tracker. The results are also comparable with the
results reported recently by other authors. (Tong et al.,
2007) tested their multi-stage hierarchical models on
a dataset of 10 sequences with 100 frames per se-
quence. Considering that their test sequences had half
of our image resolution their pixel displacement is
similar to ours. Also the pixel displacement that (Fang
et al., 2008) reported for their testing database of 2
challenging video sequences is comparable to ours.
However, both methods are computationally consid-
erably more intensive than our tracking scheme.
3.2 Qualitative Evaluation
Figure 4 shows two sample frames of video sequence
no. 1. (The 3 video sequences are available together
Table 1: Pixel displacement averaged over a whole video
sequence.
video sequence no. 1 no. 2 no. 3
KLT tracker 8.07 8.45 43.34
our algorithm 5.19 8.03 8.71
in a single file as supplementary material.) The in-
formation box on the lower left corner shows posi-
tion details. For the left image it can be observed that
the rotation about the z-axis is detected correctly and
in the right image the rotation about the y-axis is es-
timated properly. Generally, it can be qualitatively
confirmed that not only the points are tracked reli-
ably in the 2D video sequence but also 3D motion
and expressions can be extracted from the sequence.
It is also important to notice that our 3D ASM works
with a relatively small number of facial feature points,
since the 3D faces of the Bosphorus Database are la-
beled with only 22 landmarks. In contrast, current 2D
face databases have more landmarks, some of them
roughly hundred points. It is expected that a 3D ASM
with more points would further improve the tracking
and parameter estimation results.
4 CONCLUSIONS AND FUTURE
WORK
A method for 3D tracking of facial feature points from
a monocular video sequence is presented. The fa-
cial feature points are tracked with a simple Gauss-
Newton estimation scheme and the results are linked
with a 3D ASM. Thus, the efficient Gauss-Newton
minimization computes the 3D position, rotation and
3D ASM parameters instead of the shift of each fea-
ture point separately. It is demonstrated how the
amount of computations that must be performed for
each frame can be further reduced. Results show that
the algorithm tracks the points reliably for rotation,
translations, and facial expressions. It outperforms
the KLT feature tracker and delivers results compa-
rable to two other methods published recently, while
being computationally less intensive.
In our ongoing research we will analyze the effect
of using gradient images and Gabor filtered images
to further improve the tracking results. We have also
planned to integrate a weighting matrix that depends
on the rotation parameters to reduce the influence of
facial feature points that might disappear.
TRACKING OF FACIAL FEATURE POINTS BY COMBINING SINGULAR TRACKING RESULTS WITH A 3D
ACTIVE SHAPE MODEL
285