OCCLUSIONS AND ACTIVE APPEARANCE MODELS

McElory Hoffmann, Ben Herbst and Karin Hunter

Applied Mathematics,University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa

Keywords:

Model-based Object tracking in Image Sequences, Motion and Tracking, Active Appearance Models, Active

Contours, Particle Filters.

Abstract:

The deterministic active appearance model (AAM) tracker fails to track objects under occlusion. In this paper,

we discuss two approaches to improve this tracker ’s robustness and tracking results. The ﬁrst approach

initialises the AAM tracker with a shape estimate obtained from an active contour, incorporating shape history

into the tracker. The second approach combines AAMs and the particle ﬁlter, consequently employing both

shape and texture history into the tracker. For each approach, a simple occlusion detection method is suggested,

enabling us to address occlusion.

1 INTRODUCTION

Active appearance models (AAMs) (Cootes et al.,

1998) provide an elegant model-based framework for

tracking objects. They incorporate both shape and

texture (grayscale or colour information) into their

formulation, and are therefore able to track the out-

line of an object and its appearance simultaneously.

It is therefore easy to use the parameters provided

by an AAM tracker in applications such as lipread-

ing (Matthews et al., 2002). For this reason, we are

interested in ways to robustly track with AAMs.

Stegmann (Stegmann, 2001) demonstrated that

AAMs can be successfully applied to perform ob-

ject tracking. In his deterministic approach, the

AAM search algorithm is applied successively to each

frame. However, this technique is not robust since the

optimisation techniques employed in the AAM search

algorithm only explore a small local region of inter-

est. Also, no history of the object’s movement and

position is used to improve the optimisation. There-

fore the tracker fails in certain situations, e.g. when

the object moves fast or in the presence of occlusion,

see Figure 1. This is because fast movements lead

to a bad initialisation of the AAM optimisation rou-

tines and occlusion provides no local optima. One

can modify the AAM itself to increase robustness, e.g.

(a) Frame 1 (b) Frame 23 (c) Frame 54

(d) Frame 62 (e) Frame 67 (f) Frame 115

Figure 1: Selection of result frames illustrating the perfor-

mance of the deterministic AAM tracker. In (c) and (d) the

tracker loses its target due to the presence of occlusion and

is not able to recover the target as shown in (e) and (f).

in (Gross et al., 2006) the AAM is modiﬁed to handle

occlusions. In this paper, we describe easily imple-

mented techniques as add-ons to the standard AAM

tracker.

Building on work done in (Hoffmann et al., 2006),

two improvements of the deterministic AAM tracker

are discussed here, both intended to increase the ro-

bustness of the tracker. The ﬁrst technique, presented

in Section 3, initialises the AAM using the shape es-

timate obtained from active contours. In this way, the

441

Hoffmann M., Herbst B. and Hunter K. (2007).

OCCLUSIONS AND ACTIVE APPEARANCE MODELS.

In Proceedings of the Second International Conference on Computer Vision Theory and Applications - IU/MTSV, pages 441-446

 SciTePress

AAM search algorithm is restricted to a local region

of interest around the estimate from active contours.

The second technique, presented in Section 4, uses

a combination of a particle ﬁlter and an AAM to pro-

vide more robustness. Here temporal ﬁltering predicts

the parameters of the AAM so that the history of the

object’s movement and position enhances the AAM

searches. Simple adaptions to handle occlusions are

suggested for each combined tracker.

Since AMMs and particle ﬁlters are central to both

techniques, we summarise these concepts in the next

section, before presenting the two tracking techniques

in Sections 3 and 4. Experimental results of both tech-

niques are shown in Section 5 and we conclude in

Section 6.

2 BACKGROUND

Here we give a brief overview of AAMs and particle

ﬁlters.

2.1 Active Appearance Models (AAMs)

Active appearance models (Cootes et al., 1998;

Stegmann, 2000) are template based and employ both

shape and texture (colour information). Setting up the

model involves training the model parameters, as dis-

cussed next.

In the training phase of an AAM, the features on

the outline of the object are recorded and then nor-

malised with respect to translation, scale and rota-

tion. This normalisation is done by setting up a com-

mon coordinate frame, the features are then aligned

to this normalised frame, and the translation, scale

and rotation parameters are recorded in a vector called

the pose p. Using the pose and the normalised co-

ordinates, features in normalised coordinates can be

translated, scaled and rotated to their original version

in absolute coordinates. Then the shape is deﬁned

by the vector of features in normalised coordinates,

= [x

, ··· , x

, y

, ··· , y

]

, i = 1, . . . , l where l is

the number of training images and n the number of

features.

The texture of the object in question is described

by a vector, g

= [g

, g

, . . . , g

]

, i = 1, . . . , l with

m the number of texture points. Typically a piece-

wise afﬁne warp using a Delaunay triangulation of the

shape vector is performed and underlying pixel values

are then sampled, see for example (Stegmann, 2000).

Principal component analysis (PCA) is performed

on the aligned shapes and textures and this yields

s =

s + Φ

g = g + Φ

(1)

where s and g are the average shape and texture vec-

tors, Φ

and Φ

are the eigenvector matrices of the

shape and texture covariance matrices, and b

and b

are the PCA projection coefﬁcients, respectively.

A combined model parameter c is obtained by

combining the PCA scores into b = [Ψ

, b

]

with

a weighting matrix between pixel intensities and

pixel distances. A third PCA is executed on the com-

bined model parameter to obtain b = Φ

c, where Φ

is the basis for the combined model parameter space.

Writing Φ



c,s

, Φ

c,g



, it is now possible to gen-

erate new shapes and texture instances by

s = s + Φ

−1

c,s

c (2)

g = g + Φ

c,g

c. (3)

This concludes the training phase of the AAM.

In the search phase of AAMs, the model param-

eter c and the pose p are sought that best represent

an object in a new image not contained in the origi-

nal training set. Notice that from (2) and (3) changing

c varies both the shape s and the texture g of an ob-

ject. The idea is to vary c (optimise over c) so that

the shape and texture generated by (2) and (3) ﬁt the

object in the image as well as possible. The objective

function that is minimised is the difference

E = ||g

model

− g

image

= ||δg||

(4)

between the texture values generated by c and (3), de-

noted as g

model

, and the texture values in the image,

image

. Note that the image texture values g

image

for

a speciﬁc value of c are the values sampled from the

shape generated by (2) and then translated, scaled and

rotated using the pose p. In summary, the optimisa-

tion over c and p minimises (4), i.e. produces the best

ﬁt of texture values.

In the implementation of AAMs one assumes that

there exists a linear relationship between the differ-

ences in texture values, δg = g

model

− g

image

and the

optimum model parameters’ updates, hence

δp = R

δg (5)

δc = R

δg (6)

where R

and R

are estimated during the training

phase. The last step is to ﬁne-tune the parameters p

and c by gradient-descend optimisation strategies.

The optimisation strategy described above, re-

quires a good initialisation for the following reasons:

• The shape generated by c and (2) is translated,

scaled and rotated using p—a large space to

search.

• The assumption that there exists a linear relation-

ship between the differences in texture values and

the optimum model parameters’ updates is only

reasonable for small updates.

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

442

We conclude that AAMs provide a general frame-

work to track or segment different types of objects.

Furthermore, no parameters need to be speciﬁed by an

expert to use them. On the downside, AAMs require

objects to have distinct features/outlines and there is

a training phase involved. Also, a good initialisation

is required for the search algorithm.

2.2 The Particle Filter

Particle ﬁlters have become an important tool to track

objects. Following (Isard and Blake, 1998; Ristic

et al., 2004), we let the state vector x

∈ R

de-

scribe the object to be tracked at time step t, while

the measurements are given by z

∈ R

. We denote

all the measurements up until the tth time step by

, {z

, i = 1, . . . , t}. In the Bayesian framework,

the goal is to ﬁnd an estimate of x

based on all the

observations Z

. Thus the conceptual Bayes solution

recursively updates the posterior pdf

p(x

) =

p(z

, Z

t−1

) p(x

t−1

)

p(z

t−1

)

(7)

as the measurements become available. The essential

idea of the particle ﬁlter is to approximate the pos-

terior pdf in (7) by a set of samples {x

, i = 1. . . N}

and associated weights {π

, i = 1. . . N}. We call these

samples particles.

The particle ﬁlter algorithm consists of three

phases that are repeated at each time step and are il-

lustrated in Figure 2.

• The selection phase chooses the next set of parti-

cles according to the input set’s relative probabili-

ties, i.e. particles with larger weights will be cho-

sen several times, and those with smaller weights

will be selected fewer times or discarded.

• During the prediction phase the new set of parti-

cles is put through the process model to generate a

prior pdf p(x

t−1

). To account for process noise,

the particles are diffused by adding noise to them.

• The measurement phase updates the weights using

the new measurement z

p(z

= x

)

∑

n=1

p(z

= x

)

. (8)

Note that no assumption is made on the linearity

of the system and the underlying model is not as-

sumed to be Gaussian; hence particle ﬁlters can be

viewed as a generalisation of the Kalman ﬁlter.

Figure 2: A graphical illustration of one iteration of the par-

ticle ﬁlter. (Image courtesy of S.Fleck (Fleck et al., 2005)).

3 ACTIVE CONTOURS AND

ACTIVE APPEARANCE

MODELS

The combined active contour and active appearance

model (AC-AAM) proposed by (Sung and Kim,

2006) ﬁnds a shape to initialise the AAM algorithm

using active contours. This method rectiﬁes the situ-

ation in which the AAM only works well locally. A

summary of this technique is presented below.

3.1 Contours and Shape Space

Here B-splines are used to model the outline of the

tracked object (Blake and Isard, 1998).

Given a set of coordinates of control points

, y

), . . . , (x

, y

), a B-spline is the curve r(s) =

[x(s), y(s)]

formed by a parametrisation with param-

eter s on the real line,

r(s) =



B(s)

0 B(s)





(9)

where B(s) is the n × 1 vector of B-spline basis func-

tions, and Q

, Q

are the vectors of control points con-

sisting of the x-coordinates and y-coordinates respec-

tively. We refer to a curve r(s) in (9) as a contour.

The dimension of the vector space spanned by r(s)

is N

= 2N

, where N

is the number of control points

of the B-spline. This implies N

degrees of freedom

and it means that the object can deform in N

differ-

ent ways if we track it over successive frames. This

amount of deformation leads to many tracking errors,

therefore the deformation is restricted to a lower di-

mensional space, known as the shape space.

The shape space is deﬁned as a linear mapping

from a shape vector X ∈ R

to a spline vector Q =



, Q



∈ R

and the mapping is given by

Q = W X + Q

(10)

OCCLUSIONS AND ACTIVE APPEARANCE MODELS

443

where W is a matrix with rank N

 N

, describing

the allowed transformations. By restricting X, Q is

essentially a deformation of the template Q

, and the

type of deformation allowed is determined by W .

3.2 Active Contours (ACs)

Following (Isard and Blake, 1998), we summarise the

contour based particle ﬁlter. We refer to this com-

bination as active contours (ACs). Adapting a parti-

cle ﬁlter for a particular implementation, requires the

speciﬁcation of a state vector, process model and mea-

surement model.

3.2.1 The State Vector

The state vector at each time step t is given by the

shape space vector, thus x

= X

. This allow us to

generate a vector of B-spline control points Q for each

particle using (10).

3.2.2 The Process Model

States evolve according to a simple random walk

given by

(i)

= x

(i)

t−1

+ S

(i)

(11)

where S

(i)

is the process noise covariance and u

(i)

a vector of normal distributed random variables. One

can use more sophisticated process models and the

reader is referred to (Blake and Isard, 1998) for a de-

tailed discussion.

3.2.3 The Measurement Model

The binary edge map for the current frame is passed to

the algorithm to estimate the weight associated with

each particle. To calculate the weight for the ith par-

ticle, the following procedure is followed:

• Using (10) and the state vector, calculate the vec-

tor of control points Q

• Calculate the normal lines for each control point.

• For each control point, search along the normal

line until an edge is found. Let the distance from

the control points to the edge be d

, j = 1, . . . , n

where n is the number of control points. If an edge

is not found, set the distance equal to the length

of the normal line. Calculate the total length d

∑

j=1

• The weight for the ith particle is then given by

= exp



−

)



. (12)

The weights are normalised afterwards to sum to

unity. From equation (12), we see that a small value

of d

will result in a larger value of π

and vice versa.

The variance σ determines how much preference we

give to particles with a lower distance d

We need to ﬁnd a representative estimate x

from

the many hypothesis curves, typically the particle

with the highest weight or a weighted average is cho-

sen.

Furthermore, every time an edge is not found for a

particular particle, it is recorded and denoted by n

. If

∑

i=1

is larger than a certain pre-set threshold, oc-

clusion is declared. This method for the detection of

occlusion provides adequate results, for more sophis-

ticated techniques see e.g. (MacCormick, 2002).

3.3 Initialisation of AAM with AC

The AC-AAM tracker consists of two parts. At time

step t, the ﬁrst part performs standard AC tracking

with the estimate from the tracker denoted by x

. In

the second part, (10) is used together with x

to gen-

erate a shape estimate

= W x

+ Q

. (13)

Note that Q

and the AAM shape representation s (not

normalised with respect to the pose) are equivalent.

This shape Q

is then used to initialise standard AAM

optimisation and the result is output as the best ﬁtted

AAM. The result of the AAM is then used to initialise

the AC tracker again.

In the presence of occlusion the AAM optimisa-

tion fails since there is no optimum. Therefore, when

occlusion is detected by the AC part, the tracker will

output no estimate for the resulting AAM, i.e. the

AAM is “switched off”.

Note that this technique uses the particle ﬁlter in-

directly. The particle ﬁlter is an integral part of the

AC tracker, but the AAM part of the AC-AAM tracker

does not utilise the particle ﬁlter.

4 AN ACTIVE APPEARANCE

MODEL BASED PARTICLE

FILTER

This section details the second approach of increasing

the robustness of the AAM tracker. It differs from the

AC-AAM technique in the sense that the AAM is not

initialised by a secondary technique, instead temporal

ﬁltering predicts the parameters of the AAM. Conse-

quently, the history of the object is used to improve

the overall robustness of the AAM. The technique is

VISAPP 2007 - International Conference on Computer Vision Theory and Applications

444

an extension of the direct combination of an AAM

with a particle ﬁlter as was introduced by (Hamlaoui

and Davoine, 2005).

As in the case of the AC tracker, the adaption of

the particle ﬁlter to work in conjunction with an AAM

requires the speciﬁcation of a state vector, a process

model and a measurement model.

4.1 The State Vector

It is clear, from (2) and (3) that one can synthesise a

shape s and texture g for a particular image from the

model parameters c. Therefore, the state vector is a

combination of the model parameters c and the pose

p and at time step t it is given by x



p, c



4.2 The Process Model

Using the update of the model parameters (5) and (6),

the states evolves according to

(i)

t−1





δg

t−1

+ S

(i)

(14)

where

t−1

is the estimate of the state vector at time

step t − 1, S

(i)

is the process noise covariance and u

(i)

is a vector of normally distributed random variables.

However, if occlusion occurs, (14) is not a good rep-

resentation for the process model due to the inclusion

of the AAM optimisation given by



, R



δg

t−1

Hence, if occlusion is detected (discussed in the next

section), the simpler process model

(i)

= x

(i)

t−1

+ S

(i)

(15)

is used, where x

(i)

t−1

is the ith particle at the time t − 1

and the other parameters remain unchanged.

4.3 The Measurement Model

Since the purpose of the measurement model is to

classify how well a particular particle ﬁts the under-

lying image, the error (4) is measured. However, to

account for the outliers we use the Lorentzian norm

= ρ(E

, σ

) = log



1 +

2σ



(16)

where σ

is the scale parameter that discards outliers.

Then the pre-normalised weights are given by

= exp



−



(17)

where σ plays the same role as in the case with the

AC tracker. Furthermore, the number of times n

the

error measurement ρ

exceeds a pre-set threshold is

recorded. When n

is larger than half the number of

particles, occlusion is declared.

(a) Frame 1 (b) Frame 23 (c) Frame 54

(d) Frame 62 (e) Frame 67 (f) Frame 115

Figure 3: Selection of result frames to indicate the perfor-

mance of the AC-AAM tracker. The green and white shapes

are the output of active contours and AC-AAM respectively.

Note that the AC-AAM provides no AAM when the object

undergoes occlusion in (c) and (d).

5 EXPERIMENTAL RESULTS

We implemented the AAM-based particle ﬁlter

tracker in C

using the open-source AAM-

API (Stegmann et al., 2003), while the AC imple-

mentation is in MATLAB. To illustrate the effective-

ness of these trackers, they are used to track a head

and shoulders moving against a cluttered background.

The videos containing the full movement are avail-

able for download from (Hoffmann, 2006). The ap-

pearance model consists of 14 feature points and the

AAM was trained using 6 images.

In Figure 3 the results obtained with the AC-AAM

tracker are shown, while the corresponding results ob-

tained with the AAM-based particle ﬁlter are illus-

trated in Figure 4. Both trackers are able to track the

movement of the object accurately.

When the AC-AAM approach is used, a simple

implementation for the AC tracker sufﬁces, i.e. a ba-

sic dynamic model and a simpliﬁed shape space. This

can be seen in Figure 3 where the output of the AC

tracker is not as accurate as it could be, but the result-

ing AAM ﬁt is good. This illustrates the principal idea

of the AC-AAM approach: the robustness of the AC

tracker is used to initialise the AAM which in turn,

adjusts well to the underlying image. The AC-AAM

tracker detects occlusion as can be seen in Figure 3(c)

and (d), and the AAM optimisation is disabled for

these frames.

The AAM-based particle ﬁlter successfully tracks

the head and shoulders in the presence of occlusion as

illustrasted in Figure 4(c) and (d). It could be argued

that this tracker handles occlusion more successfully

than the AC-AAM tracker, since it provides an esti-

mate for all frames (also those with occlusion) and

the estimate is consistent with our intuition.

OCCLUSIONS AND ACTIVE APPEARANCE MODELS

445