each point. Compared with the holistic representa-
tions, such as AAM, working at patch level offers ex-
tra flexibilities. The parts-based representation im-
prove the model’s representation capacity, as it ac-
counts only for local correlations between pixel val-
ues and naturally is presents good performance in fit-
ting unseen appearances in comparison with the lead-
ing holistic approaches. The CLM uses as patch de-
scriptors, normalized correlation response surfaces.
In (Wang et al., 2008a) (Wang et al., 2008b) the dis-
criminant descriptor is obtained using machine learn-
ing methods, i.e. a linear Support Vector Machines
(SVM), which require a extensive training, labeling
lots of positive and negative samples. Our approach
fits on the discriminative class of methods where, like
the standard AAM, consists in two separated mod-
els: the shape and the appearance models. The shape
model is an ordinary PDM that deals with the posi-
tion of the landmarks. The appearance is composed
by a set of descriptors for each of the landmarks in
the PDM. The descriptors are covariance matrices of
multiple features evaluated on the surrounding loca-
tion of the landmarks. Since the covariance matrices
are a special set of tensors that lie on a Riemannian
manifold, it is possible to measure the dissimilarity
between two covariances, and also to update them,
imposing the temporal appearance consistency. The
method starts using a generic covariance (the aver-
age covariance observed in the training set) which is
then continuously updated. Although, like the pre-
vious methods (D.Cristinacce and T.F.Cootes, 2008)
(Wang et al., 2008a) (Wang et al., 2008b), the patch
response maps found by convolution around the cur-
rent landmark position suffers from detection ambi-
guities. It will be shown that the minimum (in covari-
ance dissimilarity) of the responce map isn’t always
the desired solution. A solution based on a mean-
shift algorithm is proposed, finding candidates to so-
lutions, followed by an unsupervised clustering tech-
nique(Figueiredo and Jain, 2002) locating and group-
ing the candidates. A mahalanobis based metric is
used to select the best solution consistent with the
PDM. Finally the global optimization step, solving
the PDM is performed using a weighted least-squares
warp update based on the Lucas and Kanade frame-
work(Baker and Matthews, 2004). The weights were
extracted from landmark matching score statistics.
This paper is organizedas follows: section 2 describes
background subjects required, namely the basics on
Riemann Manifolds and PDM building. In section 3
our approach is detailed presented, section 4 presents
experimental results and in section 5 conclusions are
presented.
2 BACKGROUND
2.1 Shape Model
The shape of a (2D) Point Distribution Model (PDM)
is defined by the vertex locations of a mesh. The rep-
resentation used for a single v-point shape is a 2v vec-
tor given by s = (x
1
, . . . , x
v
, y
1
, . . . , y
v
)
T
. The PDM
training data consists of a set of annotated images
with the shape mesh marked (usually by hand). All
the shapes are then aligned to a common mean shape
using a Generalised Procrustes Analysis (GPA), re-
moving location, scale and rotation effects. Princi-
pal Components Analysis (PCA) are then applied to
the aligned shapes, resulting on the linear parametric
model s = s
0
+ Φp, where new shapes, s, are syn-
thesized by deforming the mean shape, s
0
, using a
weighted linear combination of eigenvectors, φ
i
, i =
1, . . . , n. n is the number of eigenvectors that holds
a user defined variance, typically 95%. p is a vec-
tor of shape parameters which represents the weights.
See Figure 1-a)b)c). Notice that the GPA makes that
(a) (b) (c) (d) (e)
Figure 1: a) Shape raw data. b) Aligned landMarks after
GPA. c) Shape covariance Σ
k
around each landmark. d)
Patches P
k
, l × l around each landmark. e) Illustration of
finding the average covariance C
k
for a specific patch (left
side of left eye corner). Each training image provide a nor-
malized patch. The covariances for the feature vector f are
evaluated and using eq.2, C
k
is found.
the PDM do not model the similarity transformation
which is required onto the target image. To overcome
this we use the approach proposed by (Matthews and
Baker, 2004), i.e., we include a special set of 4 eigen-
vectors ψ
1
, . . . , φ
4
. A full shape is then described
by a linear system s = s
0
+
∑
n
i=1
p
i
φ
i
+
∑
4
j=1
q
j
ψ
j
where q represents the 2D pose parameters with q
1
=
scos(θ) − 1, q
2
= ssin(θ), q
3
= t
x
, q
4
= t
y
where s,
θ, (t
x
, t
y
) represents the scale, rotation and translation
w.r.t. the base mesh s
0
.
2.2 Texture Model - Covariance of
Features
The discriminative appearance model used is based
on a descriptor of the texture around each one of the
v landmarks. Inspired on the work of (Porikli et al.,
2006), a quadrangular region P (patch) with size l is
VISAPP 2010 - International Conference on Computer Vision Theory and Applications
364