OCCLUSIONS AND ACTIVE APPEARANCE MODELS
McElory Hoffmann, Ben Herbst and Karin Hunter
Applied Mathematics,University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa
Keywords:
Model-based Object tracking in Image Sequences, Motion and Tracking, Active Appearance Models, Active
Contours, Particle Filters.
Abstract:
The deterministic active appearance model (AAM) tracker fails to track objects under occlusion. In this paper,
we discuss two approaches to improve this tracker s robustness and tracking results. The first approach
initialises the AAM tracker with a shape estimate obtained from an active contour, incorporating shape history
into the tracker. The second approach combines AAMs and the particle filter, consequently employing both
shape and texture history into the tracker. For each approach, a simple occlusion detection method is suggested,
enabling us to address occlusion.
1 INTRODUCTION
Active appearance models (AAMs) (Cootes et al.,
1998) provide an elegant model-based framework for
tracking objects. They incorporate both shape and
texture (grayscale or colour information) into their
formulation, and are therefore able to track the out-
line of an object and its appearance simultaneously.
It is therefore easy to use the parameters provided
by an AAM tracker in applications such as lipread-
ing (Matthews et al., 2002). For this reason, we are
interested in ways to robustly track with AAMs.
Stegmann (Stegmann, 2001) demonstrated that
AAMs can be successfully applied to perform ob-
ject tracking. In his deterministic approach, the
AAM search algorithm is applied successively to each
frame. However, this technique is not robust since the
optimisation techniques employed in the AAM search
algorithm only explore a small local region of inter-
est. Also, no history of the object’s movement and
position is used to improve the optimisation. There-
fore the tracker fails in certain situations, e.g. when
the object moves fast or in the presence of occlusion,
see Figure 1. This is because fast movements lead
to a bad initialisation of the AAM optimisation rou-
tines and occlusion provides no local optima. One
can modify the AAM itself to increase robustness, e.g.
(a) Frame 1 (b) Frame 23 (c) Frame 54
(d) Frame 62 (e) Frame 67 (f) Frame 115
Figure 1: Selection of result frames illustrating the perfor-
mance of the deterministic AAM tracker. In (c) and (d) the
tracker loses its target due to the presence of occlusion and
is not able to recover the target as shown in (e) and (f).
in (Gross et al., 2006) the AAM is modified to handle
occlusions. In this paper, we describe easily imple-
mented techniques as add-ons to the standard AAM
tracker.
Building on work done in (Hoffmann et al., 2006),
two improvements of the deterministic AAM tracker
are discussed here, both intended to increase the ro-
bustness of the tracker. The first technique, presented
in Section 3, initialises the AAM using the shape es-
timate obtained from active contours. In this way, the
441
Hoffmann M., Herbst B. and Hunter K. (2007).
OCCLUSIONS AND ACTIVE APPEARANCE MODELS.
In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 441-446
Copyright
c
SciTePress
AAM search algorithm is restricted to a local region
of interest around the estimate from active contours.
The second technique, presented in Section 4, uses
a combination of a particle filter and an AAM to pro-
vide more robustness. Here temporal filtering predicts
the parameters of the AAM so that the history of the
object’s movement and position enhances the AAM
searches. Simple adaptions to handle occlusions are
suggested for each combined tracker.
Since AMMs and particle filters are central to both
techniques, we summarise these concepts in the next
section, before presenting the two tracking techniques
in Sections 3 and 4. Experimental results of both tech-
niques are shown in Section 5 and we conclude in
Section 6.
2 BACKGROUND
Here we give a brief overview of AAMs and particle
filters.
2.1 Active Appearance Models (AAMs)
Active appearance models (Cootes et al., 1998;
Stegmann, 2000) are template based and employ both
shape and texture (colour information). Setting up the
model involves training the model parameters, as dis-
cussed next.
In the training phase of an AAM, the features on
the outline of the object are recorded and then nor-
malised with respect to translation, scale and rota-
tion. This normalisation is done by setting up a com-
mon coordinate frame, the features are then aligned
to this normalised frame, and the translation, scale
and rotation parameters are recorded in a vector called
the pose p. Using the pose and the normalised co-
ordinates, features in normalised coordinates can be
translated, scaled and rotated to their original version
in absolute coordinates. Then the shape is defined
by the vector of features in normalised coordinates,
s
i
= [x
i1
, ··· , x
in
, y
i1
, ··· , y
in
]
T
, i = 1, . . . , l where l is
the number of training images and n the number of
features.
The texture of the object in question is described
by a vector, g
i
= [g
i1
, g
i2
, . . . , g
im
]
T
, i = 1, . . . , l with
m the number of texture points. Typically a piece-
wise affine warp using a Delaunay triangulation of the
shape vector is performed and underlying pixel values
are then sampled, see for example (Stegmann, 2000).
Principal component analysis (PCA) is performed
on the aligned shapes and textures and this yields
s =
s + Φ
s
b
s
g = g + Φ
g
b
g
(1)
where s and g are the average shape and texture vec-
tors, Φ
s
and Φ
g
are the eigenvector matrices of the
shape and texture covariance matrices, and b
s
and b
g
are the PCA projection coefficients, respectively.
A combined model parameter c is obtained by
combining the PCA scores into b = [Ψ
s
b
s
, b
g
]
T
with
Ψ
s
a weighting matrix between pixel intensities and
pixel distances. A third PCA is executed on the com-
bined model parameter to obtain b = Φ
c
c, where Φ
c
is the basis for the combined model parameter space.
Writing Φ
c
=
Φ
c,s
, Φ
c,g
T
, it is now possible to gen-
erate new shapes and texture instances by
s = s + Φ
s
Ψ
1
s
Φ
c,s
c (2)
g = g + Φ
g
Φ
c,g
c. (3)
This concludes the training phase of the AAM.
In the search phase of AAMs, the model param-
eter c and the pose p are sought that best represent
an object in a new image not contained in the origi-
nal training set. Notice that from (2) and (3) changing
c varies both the shape s and the texture g of an ob-
ject. The idea is to vary c (optimise over c) so that
the shape and texture generated by (2) and (3) fit the
object in the image as well as possible. The objective
function that is minimised is the difference
E = ||g
model
g
image
||
2
= ||δg||
2
(4)
between the texture values generated by c and (3), de-
noted as g
model
, and the texture values in the image,
g
image
. Note that the image texture values g
image
for
a specific value of c are the values sampled from the
shape generated by (2) and then translated, scaled and
rotated using the pose p. In summary, the optimisa-
tion over c and p minimises (4), i.e. produces the best
fit of texture values.
In the implementation of AAMs one assumes that
there exists a linear relationship between the differ-
ences in texture values, δg = g
model
g
image
and the
optimum model parameters’ updates, hence
δp = R
p
δg (5)
δc = R
c
δg (6)
where R
p
and R
c
are estimated during the training
phase. The last step is to fine-tune the parameters p
and c by gradient-descend optimisation strategies.
The optimisation strategy described above, re-
quires a good initialisation for the following reasons:
The shape generated by c and (2) is translated,
scaled and rotated using p—a large space to
search.
The assumption that there exists a linear relation-
ship between the differences in texture values and
the optimum model parameters’ updates is only
reasonable for small updates.
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
442
We conclude that AAMs provide a general frame-
work to track or segment different types of objects.
Furthermore, no parameters need to be specified by an
expert to use them. On the downside, AAMs require
objects to have distinct features/outlines and there is
a training phase involved. Also, a good initialisation
is required for the search algorithm.
2.2 The Particle Filter
Particle filters have become an important tool to track
objects. Following (Isard and Blake, 1998; Ristic
et al., 2004), we let the state vector x
t
R
n
x
de-
scribe the object to be tracked at time step t, while
the measurements are given by z
t
R
n
z
. We denote
all the measurements up until the tth time step by
Z
t
, {z
i
, i = 1, . . . , t}. In the Bayesian framework,
the goal is to find an estimate of x
t
based on all the
observations Z
t
. Thus the conceptual Bayes solution
recursively updates the posterior pdf
p(x
t
|Z
t
) =
p(z
t
|x
t
, Z
t1
) p(x
t
|Z
t1
)
p(z
t
|Z
t1
)
(7)
as the measurements become available. The essential
idea of the particle filter is to approximate the pos-
terior pdf in (7) by a set of samples {x
i
t
, i = 1. . . N}
and associated weights {π
i
t
, i = 1. . . N}. We call these
samples particles.
The particle filter algorithm consists of three
phases that are repeated at each time step and are il-
lustrated in Figure 2.
The selection phase chooses the next set of parti-
cles according to the input set’s relative probabili-
ties, i.e. particles with larger weights will be cho-
sen several times, and those with smaller weights
will be selected fewer times or discarded.
During the prediction phase the new set of parti-
cles is put through the process model to generate a
prior pdf p(x
t
|x
t1
). To account for process noise,
the particles are diffused by adding noise to them.
The measurement phase updates the weights using
the new measurement z
t
π
i
t
=
p(z
t
|x
t
= x
i
t
)
N
n=1
p(z
t
|x
t
= x
n
t
)
. (8)
Note that no assumption is made on the linearity
of the system and the underlying model is not as-
sumed to be Gaussian; hence particle filters can be
viewed as a generalisation of the Kalman filter.
Figure 2: A graphical illustration of one iteration of the par-
ticle filter. (Image courtesy of S.Fleck (Fleck et al., 2005)).
3 ACTIVE CONTOURS AND
ACTIVE APPEARANCE
MODELS
The combined active contour and active appearance
model (AC-AAM) proposed by (Sung and Kim,
2006) finds a shape to initialise the AAM algorithm
using active contours. This method rectifies the situ-
ation in which the AAM only works well locally. A
summary of this technique is presented below.
3.1 Contours and Shape Space
Here B-splines are used to model the outline of the
tracked object (Blake and Isard, 1998).
Given a set of coordinates of control points
(x
1
, y
1
), . . . , (x
n
, y
n
), a B-spline is the curve r(s) =
[x(s), y(s)]
T
formed by a parametrisation with param-
eter s on the real line,
r(s) =
B(s)
T
0
0 B(s)
T
Q
x
Q
y
(9)
where B(s) is the n × 1 vector of B-spline basis func-
tions, and Q
x
, Q
y
are the vectors of control points con-
sisting of the x-coordinates and y-coordinates respec-
tively. We refer to a curve r(s) in (9) as a contour.
The dimension of the vector space spanned by r(s)
is N
Q
= 2N
B
, where N
B
is the number of control points
of the B-spline. This implies N
Q
degrees of freedom
and it means that the object can deform in N
Q
differ-
ent ways if we track it over successive frames. This
amount of deformation leads to many tracking errors,
therefore the deformation is restricted to a lower di-
mensional space, known as the shape space.
The shape space is defined as a linear mapping
from a shape vector X R
N
x
to a spline vector Q =
Q
x
, Q
y
T
R
N
Q
and the mapping is given by
Q = W X + Q
0
(10)
OCCLUSIONS AND ACTIVE APPEARANCE MODELS
443
where W is a matrix with rank N
X
N
Q
, describing
the allowed transformations. By restricting X, Q is
essentially a deformation of the template Q
0
, and the
type of deformation allowed is determined by W .
3.2 Active Contours (ACs)
Following (Isard and Blake, 1998), we summarise the
contour based particle filter. We refer to this com-
bination as active contours (ACs). Adapting a parti-
cle filter for a particular implementation, requires the
specification of a state vector, process model and mea-
surement model.
3.2.1 The State Vector
The state vector at each time step t is given by the
shape space vector, thus x
t
= X
t
. This allow us to
generate a vector of B-spline control points Q for each
particle using (10).
3.2.2 The Process Model
States evolve according to a simple random walk
given by
x
(i)
t
= x
(i)
t1
+ S
(i)
t
u
(i)
t
(11)
where S
(i)
t
is the process noise covariance and u
(i)
t
is
a vector of normal distributed random variables. One
can use more sophisticated process models and the
reader is referred to (Blake and Isard, 1998) for a de-
tailed discussion.
3.2.3 The Measurement Model
The binary edge map for the current frame is passed to
the algorithm to estimate the weight associated with
each particle. To calculate the weight for the ith par-
ticle, the following procedure is followed:
Using (10) and the state vector, calculate the vec-
tor of control points Q
i
.
Calculate the normal lines for each control point.
For each control point, search along the normal
line until an edge is found. Let the distance from
the control points to the edge be d
j
, j = 1, . . . , n
where n is the number of control points. If an edge
is not found, set the distance equal to the length
of the normal line. Calculate the total length d
i
=
n
j=1
d
j
.
The weight for the ith particle is then given by
π
i
= exp
(d
i
)
2
σ
2
. (12)
The weights are normalised afterwards to sum to
unity. From equation (12), we see that a small value
of d
i
will result in a larger value of π
i
and vice versa.
The variance σ determines how much preference we
give to particles with a lower distance d
i
.
We need to find a representative estimate x
e
t
from
the many hypothesis curves, typically the particle
with the highest weight or a weighted average is cho-
sen.
Furthermore, every time an edge is not found for a
particular particle, it is recorded and denoted by n
i
d
. If
N
i=1
n
i
d
is larger than a certain pre-set threshold, oc-
clusion is declared. This method for the detection of
occlusion provides adequate results, for more sophis-
ticated techniques see e.g. (MacCormick, 2002).
3.3 Initialisation of AAM with AC
The AC-AAM tracker consists of two parts. At time
step t, the first part performs standard AC tracking
with the estimate from the tracker denoted by x
e
t
. In
the second part, (10) is used together with x
e
t
to gen-
erate a shape estimate
Q
e
t
= W x
e
t
+ Q
0
. (13)
Note that Q
e
t
and the AAM shape representation s (not
normalised with respect to the pose) are equivalent.
This shape Q
e
t
is then used to initialise standard AAM
optimisation and the result is output as the best fitted
AAM. The result of the AAM is then used to initialise
the AC tracker again.
In the presence of occlusion the AAM optimisa-
tion fails since there is no optimum. Therefore, when
occlusion is detected by the AC part, the tracker will
output no estimate for the resulting AAM, i.e. the
AAM is “switched off”.
Note that this technique uses the particle filter in-
directly. The particle filter is an integral part of the
AC tracker, but the AAM part of the AC-AAM tracker
does not utilise the particle filter.
4 AN ACTIVE APPEARANCE
MODEL BASED PARTICLE
FILTER
This section details the second approach of increasing
the robustness of the AAM tracker. It differs from the
AC-AAM technique in the sense that the AAM is not
initialised by a secondary technique, instead temporal
filtering predicts the parameters of the AAM. Conse-
quently, the history of the object is used to improve
the overall robustness of the AAM. The technique is
VISAPP 2007 - International Conference on Computer Vision Theory and Applications
444
an extension of the direct combination of an AAM
with a particle filter as was introduced by (Hamlaoui
and Davoine, 2005).
As in the case of the AC tracker, the adaption of
the particle filter to work in conjunction with an AAM
requires the specification of a state vector, a process
model and a measurement model.
4.1 The State Vector
It is clear, from (2) and (3) that one can synthesise a
shape s and texture g for a particular image from the
model parameters c. Therefore, the state vector is a
combination of the model parameters c and the pose
p and at time step t it is given by x
t
=
p, c
T
.
4.2 The Process Model
Using the update of the model parameters (5) and (6),
the states evolves according to
x
(i)
t
=
ˆ
x
t1
+
R
p
R
c
δg
t1
+ S
(i)
t
u
(i)
(14)
where
ˆ
x
t1
is the estimate of the state vector at time
step t 1, S
(i)
t
is the process noise covariance and u
(i)
is a vector of normally distributed random variables.
However, if occlusion occurs, (14) is not a good rep-
resentation for the process model due to the inclusion
of the AAM optimisation given by
R
p
, R
c
T
δg
t1
.
Hence, if occlusion is detected (discussed in the next
section), the simpler process model
x
(i)
t
= x
(i)
t1
+ S
(i)
t
u
(i)
(15)
is used, where x
(i)
t1
is the ith particle at the time t 1
and the other parameters remain unchanged.
4.3 The Measurement Model
Since the purpose of the measurement model is to
classify how well a particular particle fits the under-
lying image, the error (4) is measured. However, to
account for the outliers we use the Lorentzian norm
ρ
i
= ρ(E
i
, σ
s
) = log
1 +
E
i
2σ
2
s
(16)
where σ
s
is the scale parameter that discards outliers.
Then the pre-normalised weights are given by
π
i
t
= exp
ρ
i
σ
2
(17)
where σ plays the same role as in the case with the
AC tracker. Furthermore, the number of times n
ρ
the
error measurement ρ
i
exceeds a pre-set threshold is
recorded. When n
ρ
is larger than half the number of
particles, occlusion is declared.
(a) Frame 1 (b) Frame 23 (c) Frame 54
(d) Frame 62 (e) Frame 67 (f) Frame 115
Figure 3: Selection of result frames to indicate the perfor-
mance of the AC-AAM tracker. The green and white shapes
are the output of active contours and AC-AAM respectively.
Note that the AC-AAM provides no AAM when the object
undergoes occlusion in (c) and (d).
5 EXPERIMENTAL RESULTS
We implemented the AAM-based particle filter
tracker in C
++
using the open-source AAM-
API (Stegmann et al., 2003), while the AC imple-
mentation is in MATLAB. To illustrate the effective-
ness of these trackers, they are used to track a head
and shoulders moving against a cluttered background.
The videos containing the full movement are avail-
able for download from (Hoffmann, 2006). The ap-
pearance model consists of 14 feature points and the
AAM was trained using 6 images.
In Figure 3 the results obtained with the AC-AAM
tracker are shown, while the corresponding results ob-
tained with the AAM-based particle filter are illus-
trated in Figure 4. Both trackers are able to track the
movement of the object accurately.
When the AC-AAM approach is used, a simple
implementation for the AC tracker suffices, i.e. a ba-
sic dynamic model and a simplified shape space. This
can be seen in Figure 3 where the output of the AC
tracker is not as accurate as it could be, but the result-
ing AAM fit is good. This illustrates the principal idea
of the AC-AAM approach: the robustness of the AC
tracker is used to initialise the AAM which in turn,
adjusts well to the underlying image. The AC-AAM
tracker detects occlusion as can be seen in Figure 3(c)
and (d), and the AAM optimisation is disabled for
these frames.
The AAM-based particle filter successfully tracks
the head and shoulders in the presence of occlusion as
illustrasted in Figure 4(c) and (d). It could be argued
that this tracker handles occlusion more successfully
than the AC-AAM tracker, since it provides an esti-
mate for all frames (also those with occlusion) and
the estimate is consistent with our intuition.
OCCLUSIONS AND ACTIVE APPEARANCE MODELS
445