FACE TRACKING ALGORITHM ROBUST TO POSE,

ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D

PARAMETRIC MODEL APPROACH

Marco Anisetti, Valerio Bellandi

University of Milan - Department of Information Technology

via Bramante 65 - 26013, Crema (CR), Italy

Luigi Arnone, Fabrizio Beverina

STMicroelectronics - Advanced System Technology Group

via Olivetti 5 - 20041, Agrate Brianza, Italy

Keywords:

Face tracking, expression changes, FACS, illumination changes.

Abstract:

Considering the face as an object that moves through a scene, the posture related to the camera’s point of view

and the texture both may change the aspect of the object considerably. These changes are tightly coupled with

the alterations in illumination conditions when the subject moves or even when some modiﬁcations happen

in illumination conditions (light switched on or off etc.). This paper presents a method for tracking a face

on a video sequence by recovering the full-motion and the expression deformations of the head using 3D

expressive head model. Taking advantage from a 3D triangle based face model, we are able to deal with any

kind of illumination changes and face expression movements. In this parametric model, any changes can be

deﬁned as a linear combination of a set of weighted basis that could easily be included in a minimization

algorithm using a classical Newton optimization approach. The 3D model of the face is created using some

characteristical face points given on the ﬁrst frame. Using a gradient descent approach, the algorithm is

able to extract simultaneously the parameters related to the face expression, the 3D posture and the virtual

illumination conditions. The algorithm has been tested on Kanade-Cohn database (Kanade et al., 2000) for

expression estimation and its precision has been compared with a standard multi-camera system for the 3D

tracking (Elite2002 System) (Ferrigno and Pedotti, 1985). Regarding illumination tests, we use synthetic

movie created using standard 3D-mesh animation tools and real experimental videos created in very extreme

illumination condition. The results in all the cases are promising even with great head movements and changes

in the expression and the illumination conditions. The proposed approach has a twofold application as a

part of a facial expression analysis system and preprocessing for identiﬁcation systems (expression, pose and

illumination normalization).

1 INTRODUCTION

Three dimensional face tracking is an emerging and

a crucial component of many systems in emotional

expression analysis, lip reading, identity recognition,

surveillance etc. However, this is still a challenging

task because of the variability of the faces. This vari-

ability arises from the changes in the pose and fa-

cial expression deformations and from illumination

modiﬁcations. Generally speaking, regarding light

compensation, most of the methods proposed are es-

tablished on a set of images in order to compensate

the light effects. Some techniques can build geo-

metrically the set of images (basis) (Ishiyama and

Sakamoto, 2004), and others can derive them from

a collection of pictures taken in different kind of il-

lumination and then use an eigenfaces technique to

create the basis (Dornaika and Ahlberg, 2003) ,(Cas-

cia et al., 2000). Otherwise, in our proposed methods,

the illuminance basis are created directly during the

tracking by optimizing the parameters that control the

effects of a light on a 3D face model. We developed

an algorithm that can express all the effects due to

both expression and illumination, on a 3D face model

as a linear composition of a set of basis. One set can

describe the changes in the expression (these are in-

tegrated in the 3D face model) and the other can deal

with illumination changes taking advantage of the 3D

face model. An inspiring work concerning illumina-

tion changes using basis approach is the one of Hager

(Hager and Belhumeur, 1998) and (Eisert and Girod,

1997).They put the basis for illumination into an ef-

318

Anisetti M., Bellandi V., Arnone L. and Beverina F. (2006).

FACE TRACKING ALGORITHM ROBUST TO POSE, ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D PARAMETRIC MODEL APPROACH.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 318-325

DOI: 10.5220/0001365103180325

 SciTePress

ﬁcient 2D tracking algorithm using parametric mod-

els. Contrarely to this approach that needs training

for creating a good basis, our methods only use the

information resulting from a 3D model and a ﬁxed

set of illumination sources. Using this illumination

compensation based approach, we improved the qual-

ity of the 3D tracking also in realistic environment.

Concerning 3D tracking in literature, there are sev-

eral methods that like ours use a 3D template. They

can be divided into two categories: one using a geo-

metrical shape (plane or cylinder) and the other using

a 3D head wire-mesh model (Anisetti et al., 2005).

(Xiao et al., 2002) uses a cylinder to estimate the pose,

and an Active Appearance Model (AAM) method is

used to map the appearance head model to the face

region. This method can handle limited head motion,

because after a certain level of rotation the distortion

due to the difference between the head and the cylin-

der are not negligible. The uses of 3D head model

is a solution to improve the tracking, specially if the

model can also morph according to certain expres-

sion parameters. Many recent works choose this solu-

tion (Tao and Huang, 1999), (Matthews et al., 2003)

and (Dornaika and Ahlberg, 2004). (Tao and Huang,

1999) uses explanation-based facial motion tracking

algorithm, based on a Piecewise Bezier Volume De-

formation model (PBVD). In this way, they can (in

two steps) track the posture and the head deforma-

tion. (Matthews et al., 2003), (Dornaika and Ahlberg,

2004), use another method that differs from our warp-

ing technique, because theirs is an afﬁne piecewise in-

stead of being based on expression shape basis. These

control the position changes of every pixel of the im-

ages (not only the vertices) according to the expres-

sion parameters. This means that instead of the usual

sequence of operations for every pixel (which is ﬁnd-

ing out which triangle it belongs to and performing

for it the afﬁne warp for that triangle) we do an all

in one operation. This simpliﬁcation is made thanks

to the 3D face model deﬁnition for the face expres-

sion for every triangle, which simpliﬁes the operation.

Another difference, is that our methods do not need a

training set (Matthews et al., 2003), (Dornaika and

Ahlberg, 2004), to learn the parameters that control

the shape modes of the face. Similarly to the (Cootes

et al., 2000) approach, we use an optimization algo-

rithm that matches shape and texture simultaneously.

The difference is that we perform it inside the warp-

ing algorithm, considering at the same time the face

movement parameters and the roto-translation ones.

The novelty of our approach, regarding the expres-

sion inference, consists in the use of a set of basis

that deal with the face deformation directly inside the

warping transformation. Concerning the illumination

compensation, we use a 3D model for a sort of illu-

mination effect prediction. This implies that together

with the warping parameters, we extract the action

unit coding (FACS) and the position of the 3 virtual

light sources, obtaining a greater precision in track-

ing. Many algorithms avoid issues related to the il-

lumination estimation by updating the template frame

after frame. As a result, any error in motion estima-

tion is consequently propagated. Besides that, they

cannot take into account sudden light variations like

a light turning on abruptly. On this robust algorithm,

we developed some interesting applications that use

the normalized face as a ﬁnal result of pose and ex-

pression inferences. For instance, we propose a sys-

tem for classifying the emotional facial expressions in

(Andreoni et al., 2004) or for identifying certiﬁcation

of a subject in (Damiani et al., 2005). In the following

sections, we will explain all developed techniques in

details.

2 A PARAMETRIZED 3D FACE

MODEL

In literature there are many facial models both of

2D and 3D. Considering our objectives, we chose a

3D model for precision in the tracking estimation,

for useful in illumination inference and for self oc-

clusion monitoring. Furthermore, we need to adapt

the 3D model to an image by aligning some ﬁducial

points with the corresponding points on the 3D model.

Therefore the 3D model must be morphable both for

expression and for face adaptation on these ﬁducial

points. Summarizing, we need a model with these

characteristics:

1. Triangle based. This feature makes the model

suited for the afﬁne transformation, that, by deﬁni-

tion, maps triangles into triangles. It then becomes

useful in expression morphing;

2. Animation and shape parametrization. This makes

it possible to describe shape and face expression

(Animation Unit) by a simple linear formula:

g =

g + Sσ + Aα (1)

where the resulting vector is g,

g is the standard

shape model vertices coordinates, S and A are the

Shape and Animation Unit (AUV, basis for anima-

tion). The Shape Units are controlled by the σ pa-

rameter and they are used to determine the head

speciﬁc individual shape. The animation units are

controlled by the α parameter.

Considering these characteristics, a 3D morphable

model like (Blanz and Vetter, 2003) could be a op-

timal choice. Yet our aim is to use a free 3D model

not dependent on training faces, and to demonstrate

that with only few adaptations we can use a general

purpose 3D face model. In our previous works (Bel-

landi et al., 2005) (Damiani et al., 2005) we chose

FACE TRACKING ALGORITHM ROBUST TO POSE, ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D

PARAMETRIC MODEL APPROACH

319

the Candide-3 model for our tests, obtaining good re-

sults. We also used the Candide-3 in this work for cer-

tain tests, but we decided to develop a more realistic

model to deal better with expressions and especially

with illumination changes.

2.1 3D Individual Shape

Parametrization

Following these main characteristics, we can use the

shape parameters of the 3D model σ, for computing

a 3D individual shape model and texture on a sin-

gle frame representing a frontal and neutral view of

the subject. This represents the best ﬁtting 3D model

template on the subject’s face and is tightly coupled

to any different individuals. This ﬁt is performed on

the frontal frame manually, choosing some ﬁducial

points (they could be selected by any automatic se-

lection process) corresponding to some relevant fea-

tures (eyebrows, eye, nose, mouth). Then, with a

constrained optimization algorithm, we compute the

model’s shape parameters in order to minimize the er-

ror between the correspondent points p on the model

and the manually chosen ones. We deﬁne the error as

follows:

(σ, t, R)= w(g(σ, t, R) − p) 

(2)

We weight the error on the model’s points accord-

ing to the intrinsic insecurity of every point selection.

This method showed a great robustness to the noisy

coordinates of the picked points, managing every time

to reach a 3D template good enough for the tracking.

Figure(1) shows an example of the creation of an in-

dividual template.

(a) Feature Point (b) Template 3D

Figure 1: Example of 3D template extraction process.

Obviously, following this shape minimization we

do not deal with the third dimension of the face, but

we only consider the 2D deformations as the best ﬁt

shape. Nevertheless, using this approximate shape

parameterized process, the tracking still produces a

good result thanks to the error correction phase ex-

plained in the following sections. In order to test

the difference between this rough shape model and

a more realistic one, we use a real 3D mesh of a sub-

ject’s face extracted by a stereo camera system.

2.2 3D Model Morphing for

Expression Recognition

Another important feature of the 3D model is its mor-

phability. During the tracking process, for expres-

sion inference, we deduce expression parameters α,

while shape parameters σ remains constant because

they represent an intrinsic property of the subject. In

order to do this, we must ﬁnd a set of basis B that can

express linearly the changes of the expression shapes

of our 3D template T starting from neutral template

obtaining one formula such as followed:

T = T



i=1

(3)

We know that the vertices ﬁnal position of every

triangle of 3D model are:

= V



j=1

(4)

where V

is the matrix containing the coordinates

of the vertices in the ﬁnal position, where α

are the

parameters that take into account every single move-

ment of the N possible ones, A

is the matrix that

contains the components for the j-expression. A is

a sparse matrix where nonzero values on a column j

are the ones related to the vertices interested by that j-

movement. Considering 3 vertices of the k

triangle,

we can ﬁnd the transformation matrix M

that brings

a point belonging to the ﬁrst triangle into a point on

the second in this way:



fk1





ik1



(5)

From which it follows that the transformation for

each triangle can be written



ik1



−1



fk1



= I +



ik1



−1

(



j=1

)

= I +



j=1

(6)

Where A

are the matrix of the displacement re-

lated to the j-expression for the k-tern of points, and

is the transformation matrix for the j-expression

and k-tern. In this way every i

point of the tem-

plate T, according to the triangle it belongs to, can be

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

320

derived from the template T

in this way:

= T

(7)

= T

0ik



j=1

0ik

(8)

= T

0ik



j=1

(9)

In this way we obtain the set of basis of the desired

formula (3). In fact, the points B

are depended only

from the deﬁnition of the Animation Units and from

the initial template. Now we can write any expression

deformation as weighed sum of a set of ﬁxed basis.

This characteristic of the model will be exploited in

the tracking phase, making the algorithm able to esti-

mate the set of α directly.

2.3 3D Illumination Basis

We created a set of basis in order to compensate the

effects of the light changes. Since the light sources

are ”distant”, all the points on a face will be at the

same orientation according to the light-source direc-

tion. This means we will have the same light intensity

reﬂected back to the viewer from all the points of the

face. We can then compute the intensity of that face,

depending on the intensity and direction of the light

source, the intensity of ambient light, and store that

into a matrix. For the Lambertian case, there is no

problem with multiple (distant) light sources it is just

a matter of adding up their individual contributions.

It has been demonstrated that in case of Lambertian

surfaces, in absence of self shadowing, given 3 basis

of 3 linear independent light source direction, one can

reconstruct the image of the surface under a new light

direction, by the linear combination of the 3 basis. In

fact the irradiance at a point x can be given by

I = αnL (10)

where n is the normal vector to the surface,α the

albedo coefﬁcient and L is the power and the direc-

tion of incident light rays. So, ﬁrstly we calculate the

direction of the normal vector of the normal direc-

tion of each face. Then we create a matrix in which

we associate at every point of the template its normal

vector. Naming B

, B

and B

the vectors of each

direction component than the new template T can be

written starting from the previous template T

as:

T = T



l=1

cos(θ

+ rot

+ B

cos(φ

+ rot

)+B

cos(ψ

+ rot

))

(11)

where θ

,φ

,ψ

are the directions of every l

h light

and rot

,rot

are the estimations of the rotation

of the template. The λ

parameters are the intensity of

each light, and the parameters that will be estimated

by the algorithm. Furthermore, this set of normal vec-

tor basis are useful in the managing of the hidden tri-

angle in the tracking algorithm. Thanks to that we can

know which is the facets that is not visible anymore

and we do not use it in the tracking algorithm.

3 3D MOTION, EXPRESSION,

AND ILLUMINATION

RECOVERY

Our goal is to obtain posture estimation parameters,

the AUV deformation parameters and illumination

condition parameter in one minimization process be-

tween two frames using morphing and illumination

basis technique explain in previous session. First of

all, for clearness, we describe the steepest descent

algorithm for 3D posture and morphing estimation.

Based on the idea that 2D face template T

(x),(ex-

tracted by projecting 3D Template T on image plane)

appears in next frame I(x) albeit warped by W (x; p),

where p =(p

,...,p

,α

,...,α

) is vector of pa-

rameters for 3D face model with m Candide-3 anima-

tion units movement parameters and x are pixel coor-

dinate from image plain, we can obtain the movement

and expression parameter p by minimization of the

function (12); in fact if T

(x) is the template at time

t with the correct pose and the expression p and I(x)

is the frame at time t +1, assuming that the illumina-

tion condition does not change much, the next correct

pose and expression p at time t +1is obtain by min-

imization of sum of squared error between T (x) and

I(W (x; p)):





[I(W (x; p)) − T (x)]



(12)

For this minimization we use an approach like (Lu-

cas and Kanade, 1981) with forward additive imple-

mentation that assumes that current estimate of p is

known and iteratively solves for increments to the pa-

rameters ∆p. Equation (12) after some well known

passage becomes:

∆p = H

−1





∇I

∂W

∂p



[T (x) − W (x; p)]

H =





∇I

∂W

∂p





∇I

∂W

∂p



(13)

with ∇Iis the image gradient of I evaluated at

W (x; p),

∂W

∂p

is Jacobian of warp and ∆p is the in-

FACE TRACKING ALGORITHM ROBUST TO POSE, ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D

PARAMETRIC MODEL APPROACH

321

cremental warp parameters. Because we need to re-

cover the 3D posture and expression morphing pa-

rameter, we consider that the motion of head point

X =[x, y, z, 1]

between time t and t +1is:

X(t +1) = M ·X(t) and expression morphing of the

same point is: X(t +1)=(X(t)+



i=1

(α

· B

)).

Where α

and B

follows expression based rep-

resentation described in the previous section and

the matrix M follows Bregler (Bregler and Malik,

1998) and the twist representation by (Murray et al.,

1992). With these matrix the motion parameters p be-

comes (ω

,ω

,α

,...,α

) presented

in equation , α

and B

follows expression based rep-

resentation described in the previous section.With this

consideration the warping W (x; p) in (12) becomes:

W (x; p)=M (X +



i=1

(α

· B

)) (14)

In situation of perspective projection, assuming the

camera projection matrix depends only on the focal

length f

, the image plane coordinate vector x is ob-

tain with:

x(t + 1)=



x − yω

+ zω

+ t

+ B

xω

+ y − zω

+ t

+ B



−xω

+ yω

+ z + t

+ B

(t) (15)

where:



i=1

(α

− b

+ c

))



i=1

(α

+ b

− c

))



i=1

(α

(−a

+ b

+ c

))

(16)

This function maps the 3D motion and morphing in

image plane. Following the Lucas-Kanade algorithm

the Jacobian matrix

∂

∂p

at p =0becomes:



−xy (x

+ z

) −yz z 0 − x +DB

−(y

+ z

) xy xz 0 z − y +DB



(t)

(17)

where:

=(a

· z − c

· x)+···+(a

· z − c

· x)

=(b

· z − c

· y)+···+(b

· z − c

· y)

(18)

Using a forward additive parameter estimate ap-

proach, we are able to obtain the correct 3D motion

posture and morphing parameters of the template be-

tween two frames in one minimization phase. In order

to obtain more robustness for global and local illumi-

nation changes, we also introduce in our minimiza-

tion algorithm another ﬁve additional parameters us-

ing linear appearance variations. If we consider the

image template T (x) as:

T (x)+



i=1

(x) (19)

where B

i =1,...5 is a set of known appearance

variation images and λ

,i =1,...5 are the appear-

ance parameters. Global illumination changes can be

modelled as an arbitrary change in gain and bias be-

tween the template and the input image by setting B

to be the T template and B

to be the unitary ”all in

one” image. For lateral illumination we use the other

illumination basis B

i =3,...5 explained in the pre-

vious section. Using the equations 19 instead of the

T (x) in 12 we obtain the following equation that we

should minimize:

min





[I(W (x; p)) − T (x) −



i=1

(x)]



(20)

In accordance to the linear appearance variations

technique, this can be minimized using the steep-

est descent approach. In Figure (2) there are some

examples of tracking experiments with illumination

changes in realistic environment with a standard

lowquality webcam. We demonstrated the improve-

ment in accuracy of posture and expression estimation

with this one step minimization process and the qual-

ity of 3D model AUV tracking in the result section.

Figure 2: Example of tracking during extreme illumination

condition variation and with variation in pose and expres-

sion. In black the tracking mask.

4 TEMPLATE MANAGEMENT

Recovery head position, AUV face morphing with

different illumination environment, as explained be-

fore, is a difﬁcult task for a 3D algorithm because

the face has a complex 3D mesh that produces,

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

322

when moving, frequent illumination changes and self-

occlusions, and because of the difference between the

real face and the adapted 3D model. For that rea-

son, to avoid these problems we have developed some

techniques that we generally call ”Template manage-

ment”. One of these techniques concerns the selec-

tion of the signiﬁcative part of the 3D Template mask

for performance improvement. In this context, we

perform this optimisation by tresholding the gradi-

ent images in order to reduce the number of points

in the face template. By our experience, the quality of

our tracking does not decrease much by reducing the

number of points and on the other hand the speed of

the algorithm increase of an magnitude order.

4.1 Dissimilarity Analysis

Our principal goal is to estimate posture, expression

and illumination parameters of the subject, in or-

der to reconstruct the normalized neutral frontal view

of the face (reported face). To analyze the quality

of our tracking process, we have developed a tech-

nique called ”dissimilarity analysis” that works on

the retreived face. In fact, if the estimated pos-

ture, illumination and morphing parameters are cor-

rect, the frontal image will be consistent with the face

template(Figure(3)). With this approach, good track-

Figure 3: Example of posture, morphing estimation and

frontal view normalization.

ing occurs in good normalization reconstruction. If

there is any approximation error in the posture evalu-

ation, the normalized face suffers of distortion effect

depending on the amount of the error. For this reason,

by analyzing the retrieved images we can estimate the

quality of the tracking that we deﬁne the ”dissimilar-

ity level”. This analysis is performed by a 2D tracking

algorithm (with an inverse compositional implemen-

tation) of some ﬁducial points on the normalized face

(an extended set is represented as ”+” in Figure(3).

If the tracker of this ﬁducial points shows that there

is a translation bigger than a threshold, we will label

this tracking with low conﬁdential level or with high

dissimilarity. Dissimilarity is useful when we do not

have the correct posture or expression, but we need to

know how accurate our approximations are.

4.2 Dynamic template update and

mosaic technique

After the head pose, the AUV morphing, the illumi-

nation parameter is obtained by the 3D Motion tech-

nique. We update our 3D template with the one re-

covered from the current frame if the conﬁdence level

of dissimilarity is enough. Otherwise, the template is

not modiﬁed. With this template updating strategy,

we can track the head and the face movements and

partially deal with drift problems derived from the dy-

namic update. To maintain a good performance, we

continue to update every parameter except the tex-

ture, so that we do not introduce errors in the tem-

plate image that is the main responsible for the drift

effect. Therefore,this solution is not enough for a long

time tracker. For that reason, in the previous work,

we had introduced a technique called ”mosaic tem-

plate” (Anisetti et al., 2005). This technique consists

in creating and dynamically updating a collection of

templates according to the position the subject is in.

Practically speaking, it is the same as storing some

head poses and the relative templates. When the esti-

mated head pose is close to the one of that registered

template, we use it for correcting the drifting prob-

lem by re-alignment. In this way, the drifting effect is

strongly limited without many correction steps. This

also permits to adapt the correction to the dynamic

changes of the environment and of the subject him-

self (Figure(4)). During the posture correction phases

in mosaic techniques, expression parameter are also

re-corrected.

Figure 4: Example of 3D tracking with mosaic correction

(right) during ELITE2002 experiment session.

4.3 Occlusion management

Another important problem about face tracking is the

face occlusion. There are two types of occlusion: self-

FACE TRACKING ALGORITHM ROBUST TO POSE, ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D

PARAMETRIC MODEL APPROACH

323

occlusion (posture occlusion), and occlusion by ob-

ject that do not belong to the face. Because of the

presence of this ”external factors”, some pixels in the

face template should contribute less (or not at all) to

the motion estimation. To perform this, we apply a

well known IRLS technique with a modiﬁed compen-

sated approach used by (Xiao et al., 2002). Regard-

ing self-occlusion, our occlusion manager determines

the hidden facets by posture analysis. This techniques

combined with mosaic and dissimilarity analysis, per-

mits to prevent drift error and wrong mosaic tem-

plate registration problems that may occurs with some

other techniques presented in literature.

5 EXPERIMENT RESULTS

We have conducted three type of experiments for eval-

uating the precision of our system. For testing the

tracking quality improvement in cases of morphing

and no morphing 3D tracking. Secondly we tested

our algorithms for extracting the features linked to the

AU (Ekman and Friesen., 1978) on the Cohn-Kanade

Data-Base. Finally we tested our illumination para-

meter estimation with synthetic model videos and in

a realistic environment . For the ﬁrst experimental

evaluation, we used our data base including 10 dif-

ferent subjects that move, change expression and talk,

registering during ELITE2002 tracking experimental

session. For the real movement evaluation, the data-

base was recorded in the Politecnico laboratory of

Milan with a commercial web-cam at a resolution of

640x480 synchronized with ELITE2002. ELITE2002

system is an optoelectronic device able to track the

three-dimensional coordinates of a number of reﬂect-

ing markers that we placed on a helmet on the sub-

ject’s head and on the web-cam. This system, thanks

to a set of 6 cameras, can perform a tracking of a point

with a range precision of 0.3 mm. Thanks to this high

posture estimation conﬁdence, we are able to compare

our 3D tracking with real subject movement. Figure

(5) shows an example of estimate tracking values for

yaw rotation (the major rotation in the presented se-

quence) with and without morphing comparing with

ELITE2002. It is clear that if the model could esti-

mate morphing parameters, posture evaluation would

become more precise. By our experiments, the error

of tracking with expression morphing is at maximum

2-3% compared to the ones by no morphing track-

ing. This is an impressive result considering that the

no morphing tracking with some occlusion technique

improvement has good results: a maximum error of 5

grades comparing to ELITE2002. During these ex-

periments, we also monitored the dissimilarity val-

ues that describe the quality of the tracking, obtain-

ing that with morphing the dissimilarity values still

Figure 5: Comparison between ELITE2002 (solid line)

pose estimate and no morphing(left) and morphing (right)

tracking posture estimate. We also present some crop frame.

remain smaller than without the morphing. These re-

sults conﬁrm the strong correlation between the qual-

ity of the tracking and the dissimilarity value and re-

inforce the improvement of the morphing for track-

ing purposes. Regarding the second type of experi-

ments, the table(1) shows the link between the AUV

of Candid-3 model extracted during the tracking and

the AU of FACS codify used for comparing our re-

sults on Cohn-Kanade Data-Base. The same things

could be done with the morphing basis of any 3D mor-

phable face model. Our results using this data base

Table 1: Link between the AUV of Candide-3 model ex-

tracted directly during the tracking and the classical AU

AUV AU AUV AU

1 10 7 42 43 44 45

2 25 26 27 17 8 7

3 20 9 9

4 41 10 23 24 28

5 12 13 15 11 5

6 2

is very promising even using a simply fuzzy classiﬁer.

In the last type of experiments, we tested the quality

of the tracking with illumination changes. To do this,

we made synthetic cases where the subject and the

light source rotate in different situations (Figure(6)).

We obtain better quality using illumination techniques

than using IRLS or other weight techniques. Fur-

ther tests were done, performing the tracking in situ-

ations characterized by extreme and localized illumi-

nation conditions (Figure(2)) that thanks to illumina-

tion basis becomes trackable. Experiments were also

made in realistic situations with different light sources

together with expression changes (Figure(2)), and a

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

324

comparison with ELITE2002 system with changes in

illumination. Sumarizing we obtain a better precision

and an extended trackability in all cases of strong il-

lumination changes.

Figure 6: Example of synthetic test for illumination change.

(a) shows the tracked face, (b) and (c) shows face normal-

izate without and with illumination adjusting.

6 CONCLUSION

Concluding, we developed a robust expression analy-

sis oriented face tracker with posture conﬁdence eval-

uations, that makes the tracking good and very close

to ELITE2002 estimation. The algorithm proposed

is robust to face morphing and illumination changes

in spite of the difference between the 3D face model

and the real subject face. Our system performs good

results thanks to the correction techniques like the

mosaic ones and the dissimilarity analysis. We also

showed that this method permits to extracted many

measures linked to the AU that can be used for face

expression detection.

REFERENCES

Andreoni, C., Anisetti, M., Apolloni, B., Bellandi,

V., Balzarotti, S., Beverina, F., Campadelli, P.,

M.R.Ciceri, P.Colombo, F.Fumagalli, G.Palmas, and

L.Piccini (2004). E(motional) learning. In Technol-

ogy Enhanced Learning 2004 (TEL04), Milan Italy.

Anisetti, M., Bellandi, V., and Beverina, F. (Sept. 2005).

Accurate 3d model based face tracking for facial ex-

pression recognition. In Proc. of International Confer-

ence on Visualization, Imaging, and Image Processing

(VIIPO5), pages 93 – 98.

Bellandi, V., Anisetti, M., and Beverina, F. (Sept. 2005).

Upper-face expression features extraction system for

video sequences. In Proc. of International Confer-

ence on Visualization, Imaging, and Image Processing

(VIIP05), pages 83–88.

Blanz, V. and Vetter, T. (2003). Face recognition based

on ﬁtting a 3d morphable model. IEEE Transac-

tions on Pattern Analysis and Machine Intelligence,

25(9):1063 – 1074.

Bregler, C. and Malik, J. (1998). Tracking people with

twists and exponential maps. In CVPR98, pages 8–

15.

Cascia, M. L., Scarloff, S., and Anthitsos, V. (2000). Fast,

reliable head tracking under varying illumination: An

approach based on registration of texture-mapped 3d

models. IEEE Transaction on Pattern Analysis and

Machine Intelligence, 2000 (22)(4):322–336.

Cootes, T., Edwards, G., and Taylor, C. (Jun. 2000). Ac-

tive appearance mode. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 23(6):681 – 685.

Damiani, E., Anisetti, M., Bellandi, V., and Beverina, F.

(2005). Facial identiﬁcation problem: A tracking

based approach. In IEEE International Symposium on

Signal-Image Technology and InternetBased Systems

(IEEE SITIS’05).

Dornaika, F. and Ahlberg, J. (2003). Face and facial fea-

ture tracking using deformable models. International

Journal of Image and Graphics.

Dornaika, F. and Ahlberg, J. (Aug. 2004). Fast and reliable

active appearance model search for 3-d face tracking.

IEEE Transactions on Systems, Man and Cybernetics,

34(4):1838 – 1853.

Eisert, P. and Girod, B. (‘July 1997). Model-based 3d-

motion estimation with illumination compensation. In

Conference Publication.

Ekman, P. and Friesen., W. (1978). Facial action coding sys-

tem: A technique for the measurement of facial move-

ment. Consulting Psychologists Press.

Ferrigno, G. and Pedotti, A. (1985). Elite: a digital ded-

icated hardware system for movement analysis via

real-time tv signal processing. IEEE Trans Biomed

Eng., pages 943–950.

Hager, G. D. and Belhumeur, P. N. (1998). Efﬁcient region

tracking with parametric models of geometry and illu-

mination. IEEE Transaction on Pattern Analysis and

Machine Intelligence, 1998 (20)(10):322–336.

Ishiyama, R. and Sakamoto, S. (2004). Fast and accurate

facial pose estimation by aligning a 3d appearance

model. In Proc. of 17th international conference on

pattern recognition (ICPR’04).

Kanade, T., Cohn, J., and Tian, Y. (2000). Comprehen-

sive database for facial expression analysis. Proc.

4th IEEE International Conference on Automatic Face

and Gesture Recognition (FG’00), pages 46–53.

Lucas, B. and Kanade, T. (1981). An iterative image reg-

istration technique with an application to stereo vi-

sion. Proc. Int. Joint Conf. Artiﬁcial Intelligence,

pages 674–679.

Matthews, I., Ishikawa, T., and Baker, S. (2003). The tem-

plate update problem. In Proc. of the British Machine

Vision Conference.

Murray, R., Li, Z., and Sastray (1992). A mathematical

introduction to robotic manipulation. CRC press.

Tao, H. and Huang, T. (1999). Explanation-based facial

motion tracking using a piecewise bier volume defor-

mation model. In CVPR99.

Xiao, J., Kanade, T., and Cohn, J. (2002). Robust full-

motion recovery of head by dynamic templates and

re-registration techniques. Proc. of Conference on au-

tomatic face and gesture recognition.

FACE TRACKING ALGORITHM ROBUST TO POSE, ILLUMINATION AND FACE EXPRESSION CHANGES: A 3D

PARAMETRIC MODEL APPROACH

325