AAM search algorithm is restricted to a local region
of interest around the estimate from active contours.
The second technique, presented in Section 4, uses
a combination of a particle filter and an AAM to pro-
vide more robustness. Here temporal filtering predicts
the parameters of the AAM so that the history of the
object’s movement and position enhances the AAM
searches. Simple adaptions to handle occlusions are
suggested for each combined tracker.
Since AMMs and particle filters are central to both
techniques, we summarise these concepts in the next
section, before presenting the two tracking techniques
in Sections 3 and 4. Experimental results of both tech-
niques are shown in Section 5 and we conclude in
Section 6.
2 BACKGROUND
Here we give a brief overview of AAMs and particle
filters.
2.1 Active Appearance Models (AAMs)
Active appearance models (Cootes et al., 1998;
Stegmann, 2000) are template based and employ both
shape and texture (colour information). Setting up the
model involves training the model parameters, as dis-
cussed next.
In the training phase of an AAM, the features on
the outline of the object are recorded and then nor-
malised with respect to translation, scale and rota-
tion. This normalisation is done by setting up a com-
mon coordinate frame, the features are then aligned
to this normalised frame, and the translation, scale
and rotation parameters are recorded in a vector called
the pose p. Using the pose and the normalised co-
ordinates, features in normalised coordinates can be
translated, scaled and rotated to their original version
in absolute coordinates. Then the shape is defined
by the vector of features in normalised coordinates,
s
i
= [x
i1
, ··· , x
in
, y
i1
, ··· , y
in
]
T
, i = 1, . . . , l where l is
the number of training images and n the number of
features.
The texture of the object in question is described
by a vector, g
i
= [g
i1
, g
i2
, . . . , g
im
]
T
, i = 1, . . . , l with
m the number of texture points. Typically a piece-
wise affine warp using a Delaunay triangulation of the
shape vector is performed and underlying pixel values
are then sampled, see for example (Stegmann, 2000).
Principal component analysis (PCA) is performed
on the aligned shapes and textures and this yields
s =
s + Φ
s
b
s
g = g + Φ
g
b
g
(1)
where s and g are the average shape and texture vec-
tors, Φ
s
and Φ
g
are the eigenvector matrices of the
shape and texture covariance matrices, and b
s
and b
g
are the PCA projection coefficients, respectively.
A combined model parameter c is obtained by
combining the PCA scores into b = [Ψ
s
b
s
, b
g
]
T
with
Ψ
s
a weighting matrix between pixel intensities and
pixel distances. A third PCA is executed on the com-
bined model parameter to obtain b = Φ
c
c, where Φ
c
is the basis for the combined model parameter space.
Writing Φ
c
=
Φ
c,s
, Φ
c,g
T
, it is now possible to gen-
erate new shapes and texture instances by
s = s + Φ
s
Ψ
−1
s
Φ
c,s
c (2)
g = g + Φ
g
Φ
c,g
c. (3)
This concludes the training phase of the AAM.
In the search phase of AAMs, the model param-
eter c and the pose p are sought that best represent
an object in a new image not contained in the origi-
nal training set. Notice that from (2) and (3) changing
c varies both the shape s and the texture g of an ob-
ject. The idea is to vary c (optimise over c) so that
the shape and texture generated by (2) and (3) fit the
object in the image as well as possible. The objective
function that is minimised is the difference
E = ||g
model
− g
image
||
2
= ||δg||
2
(4)
between the texture values generated by c and (3), de-
noted as g
model
, and the texture values in the image,
g
image
. Note that the image texture values g
image
for
a specific value of c are the values sampled from the
shape generated by (2) and then translated, scaled and
rotated using the pose p. In summary, the optimisa-
tion over c and p minimises (4), i.e. produces the best
fit of texture values.
In the implementation of AAMs one assumes that
there exists a linear relationship between the differ-
ences in texture values, δg = g
model
− g
image
and the
optimum model parameters’ updates, hence
δp = R
p
δg (5)
δc = R
c
δg (6)
where R
p
and R
c
are estimated during the training
phase. The last step is to fine-tune the parameters p
and c by gradient-descend optimisation strategies.
The optimisation strategy described above, re-
quires a good initialisation for the following reasons:
• The shape generated by c and (2) is translated,
scaled and rotated using p—a large space to
search.
• The assumption that there exists a linear relation-
ship between the differences in texture values and
the optimum model parameters’ updates is only
reasonable for small updates.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
442