a webcam video. The CCA and the KCCA algorithm
performance did not present any problem when track-
ing the head in these video sequences, especially be-
cause there are not facial gesture involved. In the case
of the webcam video, we succeeded in tracking cor-
rectly rotations in the y plane going as far as ±35
◦
.
However, when trying to go further, the algorithm
could not estimate the correct variation of the angle
and got lost.
Another test has been performed using a part of
the talking face video. This video presents slight head
pose changes compared with the previous employed
videos, but it presents more significant movements
due to facial gesture. This video shows a person en-
gaged in conversation in front of a camera. It comes
with ground truth data that consist of characteristic
face points annotated semi-automatically. From the
68 annotated points, we chose the 52 points that were
closer to the corresponding Candide model’s points.
Because these points were not exactly the same as the
ones given in the ground truth database, there existed
an initial distance between the points. In order to mea-
sure the behavior of our algorithm, we calculated the
standard deviation of this distance, as shown in Figure
5. We can see that the points that presented the higher
variance were those in the head’s contour.
In Figure 6 we can see the result of tracking the
talking face over 1720 frames. The importance of this
figure is that we can see the evolution of the error dur-
ing the video. We have seen that the peaks appearing
in this figure represent the moments when there was
a facial gesture or a important rotation. However, as
seen in the frames displayed, we can cosider that these
peaks does not represent a significant error between
the state vector estimated and the real head pose.
The time required per frame processing depends
on the video size, as can be seen in the table 1. In that
table we show also the comparison between the CCA
and the KCCA implementation.
Table 1: Comparation of time per frame.
Video’s size [pixels] time per frame [ms]
CCA 640× 480 147.6
CCA 720× 576 179.5
KCCA 320× 240 2486.7
5 CONCLUSIONS
We have seen that the pose tracking is well performed
with the two trackers implemented. They managed to
follow the head movements in long video sequences
of more than 1700 frames. The main advantage of this
algorithm is that it is simple and proved to be robust
to facial gesture. However, we observed from simu-
lations that the effectiveness of this kind of tracker is
dependant on the mask initialization, i.e., the 3D mask
must be correctly initialized, in pose and in facial fea-
tures at the first frame, otherwise, the tracker can get
lost because the model affects directly the texture ex-
traction and consequently the state vector predictor.
The results obtained by means of the CCA and the
KCCA did not present a significant difference. How-
ever, if we consider the computation time required for
the KCCA algorithm, which was 10 times slower than
the CCA algorithm, we can conclude that for the type
of data we use, it is better to use the linear approach.
In our future work we will add the gesture track-
ing, based on the CCA approach, principally for
tracking the mouth and eyebrows, and based on the
work of (La Cascia et al., 2000), we will include a
robust measure to the tracking algorithm.
REFERENCES
Ahlberg, J. (2001). Candide-3 – an updated parameterized
face. Technical Report LiTH-ISY-R-2326, Linkoping
University, Sweden.
Borga, M., Landelius, T., and Knutsson, H. (1997). A uni-
fied approach to PCA, PLS, MLR and CCA. Report
LiTH-ISY-R-1992, ISY, SE-581 83 Link
¨
oping, Swe-
den.
Davoine, F. and Dornaika, F. (2005). Real-Time Vision for
Human Computer Interaction, chapter Head and Fa-
cial Animation Tracking using Appearance-Adaptive
Models and Particle Filters. Springer Verlag.
Dehon, C., Filzmoser, P., and Croux, C. (2000). Robust
methods for canonical correlation analysis. In Kiers,
H., Rasson, J., Groenen, P., and Schrader, M., editors,
Data Analysis, Classification, and Related Methods,
pages 321–326. Springer-Verlag.
Hardoon, D., Szedmak, S., and Shawe-Taylor, J. (2004).
Canonical correlation analysis; an overview with ap-
plication to learning methods. Neural Computation,
16:2639–2664.
La Cascia, M., Sclaroff, S., and Athitsos, V. (2000). Fast,
reliable head tracking under varying illumination: an
approach based on registration of texture-mapped 3D
models. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 22(2):322–336.
Melzer, T., Reiter, M., and Bischof., H. (2003). Appearance
models based on kernel canonical correlation analysis.
Pattern Recognition, 36(9):1961–1973.
Weenink, D. (2003). Canonical correlation analysis. In
Proceedings of the Institute of Phonetic Sciences of
the University of Amsterdam, Netherlands, volume 25,
pages 81–99.