FACIAL EXPRESSION RECOGNITION USING

LOG-EUCLIDEAN STATISTICAL SHAPE MODELS

Bartlomiej W. Papiez, Bogdan J. Matuszewski, Lik-Kwan Shark and Wei Quan

Applied Digital Signal and Image Processing Research Centre, University of Central Lancashire, PR1 2HE Preston, U.K.

Keywords:

Facial expression representation, Facial expression recognition, Vectorial log-Euclidean statistics, Statistical

shape modelling.

Abstract:

This paper presents a new method for facial expression modelling and recognition based on diffeomorphic

image registration parameterised via stationary velocity ﬁelds in Log-Euclidean framework. The validation

and comparison are done using different statistical shape models (SSM) built using the Point Distribution

Model (PDM), velocity ﬁelds, and deformation ﬁelds. The obtained results show that the facial expression

representation based on stationary velocity ﬁeld can be successfully utilised in facial expression recognition,

and this parameterisation produces higher recognition rate than the facial expression representation based on

deformation ﬁelds.

1 INTRODUCTION

Face is an important medium used by humans to

communicate, but also reﬂecting a person’s emo-

tional and awareness states, cognitive activity, per-

sonality or wellbeing. Over last ten years auto-

matic facial expression representation and recogni-

tion have become area of signiﬁcant research inter-

est for the computer vision community, with applica-

tions in human-computer interaction (HCI) systems,

medical/psychological sciences, and visual commu-

nications to name a few.

Although, signiﬁcant efforts have been undertaken

to improve the facial features extraction process and

the recognition performance, automatic facial expres-

sion recognition is still a challenging task due to an in-

herent subjective nature of the facial expressions and

their variation over different gender, age, and ethnic-

ity groups. Detailed overview of existing methodolo-

gies, recent advances and challenges can be found in

(Matuszewski et al., 2011; Tian et al., 2011; Fasel and

Luettin, 2003; Pantic et al., 2000).

The facial expression representation can be seen

as a process of extracting features, which could be

generic as local binary patterns (Shan et al., 2005)

or Gabor coefﬁcients (Bartlett et al., 2003) or more

speciﬁc such as landmarks of characteristic points lo-

cated in areas of major facial changes due to articu-

lation (Kobayashi and Hara, 1997), or a topographic

context (TC) that treats the intensity levels of an im-

age as a 3-D terrain surface (Wang and Yin, 2007).

Recently, in (Quan et al., 2007b; Quan et al., 2009)

authors postulated that the space shape vectors (SSV)

of the statistical shape model (SSM) can constitute

a signiﬁcant feature space for the recognition of fa-

cial expressions. The SSM can be constructed in

many different ways, and it was developed based on

the point distribution model originally proposed by

(Cootes et al., 1995). In (Quan et al., 2007a), the SSM

is built based on the control points of the B-Spline

surface of the training data set, and in (Quan et al.,

2010) an improved version with multi-resolution cor-

respondence search and multi-level model deforma-

tion was proposed. In this paper, the SSM is gener-

ated using the stationary velocity ﬁelds obtained from

diffeomorphic face registration. The idea of using the

motion ﬁelds as feature in computer vision and pat-

tern recognition was used previously for face recogni-

tion where the optical ﬂow was computed to robustly

recognise face under different expressions based on a

single sample per class in the training set (Hsieh et al.,

2010).

In medical image analysis, the parameterisation of

the diffeomorphic transformation based on the princi-

pal logarithm to non-linear geometrical deformations

was introduced by (Arsigny et al., 2006). Using this

framework, the Log-Euclidean vectorial statistics can

be performed on the diffeomorphic vector ﬁelds via

their logarithm, which always preserve the invertibil-

ity constraint contrary to the Euclidean statistics on

351

W. Papiez B., J. Matuszewski B., Shark L. and Quan W..

FACIAL EXPRESSION RECOGNITION USING LOG-EUCLIDEAN STATISTICAL SHAPE MODELS.

DOI: 10.5220/0003867503510359

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (SADM-2012), pages 351-359

ISBN: 978-989-8425-98-0

Copyright

c

2012 SCITEPRESS (Science and Technology Publications, Lda.)

the deformation ﬁelds. Recently, the stationary ve-

locity ﬁeld parametrisation has been utilised for de-

formable image registration in different way e. g. for

exponential update of deformation ﬁeld (Vercauteren

et al., 2009), or producing the principal logarithm di-

rectly as an output of image registration e. g. inverse

consistent image registration (Ashburner, 2007; Ver-

cauteren et al., 2008) or symmetric inverse consis-

tent image registration (Han et al., 2010). Those al-

gorithms preserve the spatial topology of objects by

maintaining diffeomorphism. As the facial shapes

(mouth, eyes, eye brows) have constant intra- and

inter-subject topology, it is interesting to check ad-

equacy of the facial expressions represented using a

stationary velocity ﬁelds as a result of performing

diffeomorphic image registration and compare with

the deformation ﬁeld based facial expression repre-

sentation in terms of separability in feature space and

recognition performance.

The remainder of the paper is organised as fol-

lows. Section 2 introduces the concept of the SSM

with detailed description of the group-wise registra-

tion algorithm (Section 2.1). Then, the velocity ﬁeld

based representation of facial expression is described

in 2.2, and the Point Distribution Model is presented

in Section 2.3. The experimental results of qualita-

tive and quantitative evaluation are shown in Section

3 with concluding remarks in Section 4.

2 STATISTICAL SHAPE MODEL

The statistical shape model was developed based on

the point distribution model originally proposed by

(Cootes et al., 1995). The model represents the fa-

cial expression variations based on the statistics cal-

culated for corresponding features during the learning

process for the training data set. In order to build an

SSM, the correspondence of facial features between

different faces in the training data set must be estab-

lished. This is done here ﬁrst by generating a mean

face model for the neutral facial expression data set

to ﬁnd the mappings from any face to the so called

common face space. Then, by transferring subject

speciﬁc facial expressions data set into the common

face space, the intra-subject facial expression corre-

spondence is estimated. Finally, the principal compo-

nent analysis (PCA) is applied to the training data set

aligned in the common face space, to provide a low-

dimensional feature space for facial expression repre-

sentation.

2.1 Log-domain Group-wise Image

Registration

Generation of the mean face model is an essential step

during the training process because it allows a subject

independent common face space to be established for

further analysis.

For a given set of n-dimensional images represent-

ing neutral facial expressions denoted by

I

ne

= {I

ne

k

: Ω ⊂ R

n

→ R, k = 1, ... , K} (1)

where K is the number of subjects included in training

data, the objective is to estimate a set of displacement

ﬁelds

ˆ

u

ne

to map the image taken from I

ne

to the mean

face model I

mean

.

In general, this problem can be formulated as a

minimisation problem:

ˆ

u

ne

= argmin

u

ne

ε(u

ne

;I

ne

) (2)

where ε(u

ne

) is deﬁned as

ε(u

ne

) =

∑

k

∑

l

Z

Ω

Sim(I

ne

k

(~x +~u

k

(~x),I

ne

l

(~x +~u

l

))dx

+α

∑

k

Z

Ω

Reg(~u

k

(~x))dx (3)

where ~x = [x

1

,..., x

n

] ∈ Ω denotes given voxel po-

sition, Sim denotes a similarity measure between each

pair of the images, I

ne

k

and I

ne

l

(l 6= k) from I

ne

, Reg

denotes a regularisation term, and α is a weight of

the regularisation term. In this work, the deformation

ﬁelds are parameterised by recently proposed station-

ary velocity ﬁelds ~v(~x) via exponential mapping (Ar-

signy et al., 2006):

ϕ(~x) =~x +~u(~x) =~x + exp(~v(~x)). (4)

To minimise Equation 2, Demon force (Vercauteren

et al., 2009) was used in the symmetric manner (Pa-

piez and Matuszewski, 2011) in the following way:

~

du

i

kl

=

(I

ϕ

i

k

k

− I

ϕ

i

l

l

)(∇I

ϕ

i

k

k

+ ∇I

ϕ

i

l

l

)

k∇I

ϕ

i

k

k

+ ∇I

ϕ

i

l

l

k

2

+ (I

ϕ

i

k

k

− I

ϕ

i

l

l

)

2

(5)

where I

ϕ

i

k

k

= I

ne

k

(ϕ

i

k

(~x)), I

ϕ

i

l

l

= I

ne

l

(ϕ

i

l

(~x)) are warped

images and ∇I

ϕ

i

k

k

, ∇I

ϕ

i

l

l

are gradients of those images,

and i is an iteration index. The average update of the

velocity ﬁeld is calculated using the Log-Euclidean

mean for vector ﬁelds

~

du

i

kl

given by (Arsigny et al.,

2006):

~

dv

i

k

=

1

K

∑

l

log(

~

du

i

kl

) (6)

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

352

and the deformation ﬁeld ~u

i+1

k

(~x) is calculated via ex-

ponential mapping for the updated velocity ﬁeld:

~v

i+1

k

(~x) =~v

i

k

(~x) +

~

dv

i

k

(~x) (7)

Although according to Equation 6 the Log-Euclidean

mean requires calculating of the logarithm, which is

reported to be a time-consuming process (Arsigny

et al., 2006; Bossa et al., 2007), the symmetric Log-

Domain Diffemorphic Demon approach (Vercauteren

et al., 2008) is used which produces the principal log-

arithm of transformation as an output of image reg-

istration and therefore the logarithm is not calculated

directly. Finally, the mean face model is generated by

averaging the intensity of all images after registration:

I

mean

=

1

K

K

∑

k

I

ne

k

(ϕ

k

(~x)) (8)

The procedure for estimation of the set of deforma-

tion ﬁelds for generation the common face space is

summarised below:

repeat

for k=1:K

for l=1:K and l!=k

Calculate update (Equation 5)

end

Calculate average of updates (Equation 6)

Update velocity field (Equation 7)

Smooth velocity field using

Gaussian filter

end

i = i+1;

until (velocity fields do not change) or

(i>max_Iteration)

As an example, the mean face model estimated by

applying the scheme to neutral expressions is illus-

trated in Figure 1, and the faces with neutral expres-

sion mapped into common face space are shown in

Figure 2.

Figure 1: Grey-level average of mean face before registra-

tion (left), and after registration (right), obtained for 40 im-

ages from training data set.

The presented algorithm of generating the mean

face model is similar to the work presented in (Geng

Figure 2: Examples of images representing neutral expres-

sion (top) and the same faces mapped into the common face

space.

et al., 2009). The main difference is in how the de-

formation ﬁelds are parameterised with the stationary

velocity ﬁeld used in the proposed method instead of

the Fourier series in (Geng et al., 2009), and secondly

in the method of solving Equation 2 with the Demon

approach used instead of the linear elastic model. Us-

ing the Log-domain parametrisation for deformation

ﬁelds is reported to produce smoother deformation

ﬁelds and it allows vectorial statistics to be calculated

directly on the velocity ﬁelds.

2.2 Velocity Field based Facial

Expression Model

The next step is to warp all other training faces repre-

senting different facial expressions to the mean face

(the reference face) via transformation ϕ

k

(~x) esti-

mated for neutral expressions. For a given set of facial

expression images from subject K:

I

ex

k

= {I

ex

km

: Ω ⊂ R

n

→ R, m = 1, . . . ,M} (9)

where M denotes the number of images, the transfor-

mation ϕ

k

(~x) is applied to get a set of facial expres-

sion images in the common face space (space of the

reference image):

I

cex

k

= {I

ex

km

(ϕ

k

(~x))} (10)

By applying the Log-Domain image registration ap-

proach based on the consistent symmetric Demon al-

gorithm (Vercauteren et al., 2008), each image in set

I

cex

k

is registered to image of neutral expression in

common face space I

ne

k

(ϕ

k

(~x)), the set of the veloc-

ity ﬁelds v

ex

k

is estimated, and the set of the corre-

sponding deformation ﬁelds u

ex

k

via exponential map-

ping is calculated as well. Utilising this particular

method for image registration has two important ad-

vantages. Firstly, the consistency criterion is main-

tained during the registration process that helps to

keep the smooth transformation especially for cases

like matching between open-mouth and close-mouth

FACIAL EXPRESSION RECOGNITION USING LOG-EUCLIDEAN STATISTICAL SHAPE MODELS

353

shapes. Secondly, the results of registration are the

velocity ﬁelds so there is no necessity of calculating

the principal logarithm of transformations.

2.3 Point Distribution Model

The point distribution model originally proposed by

(Cootes et al., 1995) is one of most often used tech-

niques for representing shapes. This model describes

a shape as a set of positions (landmarks) in the refer-

ence image. The variations between different shapes

require establishment of the correspondence between

points detected in the reference image to images rep-

resenting different deformation in the training set. Al-

though this can be relatively reliably achieved during

the model training phase by careful time consuming,

often manual selection of corresponding points, such

task is prone to occurrence of gross errors during the

model evaluation where often near real time perfor-

mance is required. The examples of the manually se-

lected landmarks for neutral and happiness expression

are shown in Figure 3. The automatically selected

landmarks used later on in the experimental section

are obtained with help of face image registration de-

scribed in the previous section. In that case, the man-

ually selected landmarks in the model face are auto-

matically mapped into registered faces.

Figure 3: Manually selected landmarks for neutral expres-

sion(left), and happiness expression(right).

2.4 Principal Component Analysis

Using the standard principal component analysis

(PCA), each face representation in the training data

set can be approximately represented in a low-

dimensional shape vector space instead of the original

high-dimensional data vector space (Bishop, 2006).

Figure 4 shows the effect of varying the ﬁrst three

largest principal component of the PDM for automati-

cally selected landmarks, where λ is eigenvalue of the

covariance matrix calculated from the training data

set.

Figure 4: Variations of the ﬁrst (top row), the second (mid-

dle row), and the third (bottom row) major mode of the

Point Distribution Model for automatically selected land-

marks.

3 EXPERIMENTAL RESULTS

The data set used for validation (Yin et al., 2006) con-

sists of 48 subjects with a wide variety of ethnicity,

age and gender. Some example faces taken from that

database are shown in Figure 5, and Figure 6 shows

the range of expression intensity. The data used dur-

ing the training procedure are mutually excluded with

the data used for validation. The group-wise registra-

tion based on Demon minimises the Sum of Squared

Difference (SSD) between images and hence due to

different skin patterns an additional image intensities

values adjustment was performed.

Figure 5: Four sample subjects showing seven expressions

(neutral, angry, disgust, fear, happiness, sadness, and sur-

prise).

3.1 Separability Analysis

To assess whether the Shape Space Vectors based on

the velocity ﬁelds can be used as a feature space for

facial expression analysis and recognition, the separa-

bility of the SSV-based features has been analysed.

The ﬁrst three element of the SSM are used to re-

veal clustering characteristics and separability pow-

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

354

Figure 6: Samples of expressions: sadness (top) and hap-

piness (bottom) for of different expression intensity ranges

from low (left), middle, high to highest (right).

ers. The SSM for training was built using 24 subjects,

each containing 25 faces, the SSV is based on the

automatically selected points (with 60 landmarks per

face), the velocity ﬁelds, and the deformation ﬁelds

(with 512x512 pixels per image). The test data set was

extracted from another 24 subjects. The training data

set and the testing data set are mutually exclusive. Ex-

amples of some expressions given in Figures 7- 9 ex-

hibit good separability even in the low-dimensional

space, especially for expressions such as ”happiness

vs. sadness” or ”disgust vs. surprise”. The expres-

sions like ”anger vs. fear” appear to overlap more

each other, but the clusters can be identiﬁed.

In order to quantitatively assess the separability of

the presented facial expression features, the appropri-

ate criteria have to be calculated. A computable cri-

terion for measurement of within-class and between-

class distances was computed similarly as it was done

by (Wang and Yin, 2007; Quan et al., 2009). The

within-class scatter matrix S

W

is deﬁned as follows:

S

W

=

c

∑

i=1

1

n

n

i

∑

k=1

(~x

i

k

−~m

i

)(~x

i

k

−~m

i

)

T

(11)

and the between-class scatter matrix S

B

is deﬁned as:

S

B

=

c

∑

i=1

n

i

n

(~m

i

−~m)(~m

i

−~m)

T

(12)

where: ~x

i

k

is d-dimensional feature, n

i

is the number

of samples in ith class, n is the number of samples in

all classes, c is the number of classes, ~m

i

is the mean

of samples in the ith class deﬁned as:

~m

i

=

1

n

i

n

i

∑

k=1

~x

i

k

(13)

~m is the mean of all the samples:

~m =

c

∑

i=1

P

i

~m

i

(14)

The separability criterion J

2

(~x) is deﬁned as a natu-

ral logarithm of the ratio within-class scatter matrix’s

Figure 7: Separability Analysis for automatically selected

landmarks using ﬁrst three principal components.

determinant and between-class scatter matrix’s deter-

minant:

J

2

(~x) = ln

det(S

B

+ S

W

)

det(S

W

)

(15)

This separability criterion is efﬁcient for compari-

son of different feature selection, lying in the com-

pletely different spaces (also with different dimen-

sionalities), and it is intrinsically normalised and re-

ﬂects the quantity of separability for features between

different classes (Wang and Yin, 2007; Quan et al.,

2009). The larger value of J

2

(~x) means the better sep-

arability. The separability criterion was evaluated on

the different facial expression representation and the

results are shown in Figure 10. For the same ratio

of retained energy in the training data, the value of

J

2

(~x) for the manually selected landmarks is the high-

est. The automatically selected landmarks in range

above 80% is not signiﬁcantly different than the man-

ually selected landmarks. The velocity ﬁeld and the

deformation ﬁeld based facial expression representa-

tion is the worst.

To quantify the between-expression separability,

the two-class separability criterion is evaluated (Wang

and Yin, 2007). The within-class scatter matrix

FACIAL EXPRESSION RECOGNITION USING LOG-EUCLIDEAN STATISTICAL SHAPE MODELS

355

Figure 8: Separability Analysis for full deformation ﬁeld

using ﬁrst three principal components.

S

ex

i

,ex

j

W

for two classes case (c=2) is deﬁned as fol-

lows:

S

ex

i

,ex

j

W

=

1

n

(

n

ex

i

∑

k=1

(~x

ex

i

k

−~m

ex

i

)(~x

ex

i

k

−~m

ex

i

)

T

+

n

ex

j

∑

l=1

(~x

ex

j

l

−~m

ex

j

)(~x

ex

j

l

−~m

ex

j

)

T

) (16)

and between-class scatter matrix S

ex

i

,ex

j

B

is deﬁned:

S

ex

i

,ex

j

B

=

n

ex

i

n

ex

j

n

2

(~m

ex

i

−~m

ex

j

)(~m

ex

i

−~m

ex

j

)

T

(17)

where ex

i

and ex

j

are analysed expressions, n

ex

i

, n

ex

j

are the numbers of samples in ith and jth class, n =

n

ex

i

+n

ex

j

. Then for each pair of selected expressions

J

ex

i

,ex

j

2

(~x) on the different facial expression represen-

tation is calculated.

Tables 1-4 shows the separability of all pairs of

expression for different facial expression representa-

tion. Those results support the visual inspection of

the qualitative analysis presented in Figures 7-9. The

separability of the pair of expression such as happi-

ness and sadness, or disgust and surprise gets higher

values of separability criterion exp

J

2

(~x)

(the minimum

Figure 9: Separability Analysis for full velocity ﬁeld using

ﬁrst three principal components.

Figure 10: Separability of expression for different features

in term of separability criterion J

2

(~x).

Table 1: Confusion matrix of the expression separability

criterion of J

ex

i

,ex

j

2

(~x) for the manually selected landmarks.

Ang Dis Fea Hap Sad Sur

Ang - 2.09 2.26 3.61 1.61 4.19

Dis - - 2.37 3.65 2.80 3.56

Fea - - - 1.98 2.02 2.66

Hap - - - - 3.93 4.44

Sad - - - - - 3.94

Sur - - - - - -

2.93), while the pair angry and fear lower (the maxi-

mum 2.62).

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

356

Table 2: Confusion matrix of the expression separability

criterion of J

ex

i

,ex

j

2

(~x) for the automatically selected land-

marks.

Ang Dis Fea Hap Sad Sur

Ang - 1.91 2.01 3.08 1.44 3.33

Dis - - 2.03 3.08 2.33 2.93

Fea - - - 1.90 1.84 2.33

Hap - - - - 3.27 3.68

Sad - - - - - 3.20

Sur - - - - - -

Table 3: Confusion matrix of the expression separability

criterion of J

ex

i

,ex

j

2

(~x) for the full deformation ﬁelds.

Ang Dis Fea Hap Sad Sur

Ang - 2.15 2.53 2.68 1.74 4.21

Dis - - 2.23 3.39 2.68 3.68

Fea - - - 1.72 2.05 2.67

Hap - - - - 3.25 3.82

Sad - - - - - 4.08

Sur - - - - - -

Table 4: Confusion matrix of the expression separability

criterion of J

2

(~x) for the full velocity ﬁelds.

Ang Dis Fea Hap Sad Sur

Ang - 2.23 2.62 2.83 1.78 4.10

Dis - - 2.27 2.44 2.83 3.63

Fea - - - 1.88 2.09 2.73

Hap - - - - 3.53 3.96

Sad - - - - - 4.27

Sur - - - - - -

3.2 Experiments on Facial Expression

Recognition

The separability analysis conducted in the previous

section indicates that the SSV feature space based on

the velocity can be used for classiﬁcation of facial ex-

pressions. Data used for classiﬁcation based valida-

tion again consists of 48 subjects, and contains neutral

expression, and six basic facial expressions of anger,

disgust, fear, happiness, sadness, and surprise with

four different expression intensity ranges. These data

were divided into six subsets containing 8 subjects

with 25 faces per subject representing different ex-

pressions. During evaluation procedure one subset is

chosen as the testing set, the remaining data are used

for the training procedure. Four types of facial expres-

sion representation have been used for validation: the

manually selected landmarks from the database (Yin

et al., 2006), the automatically detected facial land-

marks using Log-Domain Demon registration, the full

velocity ﬁelds, and the full deformation ﬁelds.

Table 5: Confusion matrix of the LDA for the manually

selected landmarks.

Input/ Ang Dis Fea Hap Sad Sur

Output (%) (%) (%) (%) (%) (%)

Ang 74.5 4.7 3.1 3.1 14.6 0.0

Dis 8.3 81.8 4.7 0.5 3.6 1.0

Fea 7.8 1.6 59.9 11.5 16.1 3.1

Hap 4.2 2.1 8.3 85.4 0.0 0.0

Sad 16.7 1.6 4.2 0.0 77.6 0.0

Sur 1.0 2.1 4.2 0.5 2.6 89.6

Table 6: Confusion matrix of the LDA for the automatic

selected landmarks.

Input/ Ang Dis Fea Hap Sad Sur

Output (%) (%) (%) (%) (%) (%)

Ang 68.8 5.2 5.2 2.6 18.2 0.0

Dis 12.5 76.6 5.7 0.5 3.6 1.0

Fea 7.8 2.6 55.2 14.1 19.3 1.0

Hap 4.1 1.6 11.5 82.3 0.0 0.5

Sad 19.8 3.1 4.7 0.0 72.4 0.0

Sur 1.0 3.1 7.8 0.5 2.6 87.0

Table 7: Confusion matrix of the LDA for the full deforma-

tion ﬁelds.

Input/ Ang Dis Fea Hap Sad Sur

Output (%) (%) (%) (%) (%) (%)

Ang 74.5 9.9 1.0 2.6 10.9 1.0

Dis 9.4 75.5 6.3 5.7 1.6 1.6

Fea 5.7 2.6 56.8 15.6 11.5 7.8

Hap 2.1 6.3 16.1 74.0 1.0 0.5

Sad 12.0 0.5 7.3 2.1 78.1 0.0

Sur 2.6 1.0 2.1 2.1 1.0 91.1

Table 8: Confusion matrix of the LDA for the full velocity

ﬁelds.

Input/ Ang Dis Fea Hap Sad Sur

Output (%) (%) (%) (%) (%) (%)

Ang 77.6 7.8 0.5 2.1 11.5 0.5

Dis 8.9 77.1 5.2 5.2 2.6 1.0

Fea 4.7 3.6 61.5 9.9 13.0 7.3

Hap 3.1 6.3 14.1 76.0 0.0 0.5

Sad 15.1 0.0 6.8 1.6 76.6 0.0

Sur 1.6 1.6 3.6 1.0 1.6 90.6

Three commonly used classiﬁcation methods

were used for evaluation, namely linear discriminant

analysis (LDA), quadratic classiﬁer (QDC), and near-

est neighbour classiﬁer (NCC). The detailed descrip-

tion of these methods can be found in most of the text-

books on pattern recognition e.g. (Bishop, 2006).

The average recognition rates and standard devi-

ations of all six experiments for different facial ex-

FACIAL EXPRESSION RECOGNITION USING LOG-EUCLIDEAN STATISTICAL SHAPE MODELS

357

Table 9: Summary of diagonal for confusion matrix of the

LDA for different features.

Fea./ Ang Dis Fea Hap Sad Sur

Exp. (%) (%) (%) (%) (%) (%)

Man. 74.5 81.8 59.9 85.4 77.6 89.6

Aut. 68.8 76.6 55.2 82.3 72.4 87.0

Def. 74.5 75.5 56.8 74.0 78.1 91.1

Vel. 77.6 77.1 61.5 76.0 76.6 90.6

Table 10: Recognition rate for different classiﬁer’s meth-

ods.

Feature/ LDA QDA NNC

classiﬁer (%± SD) (%± SD) (%± SD)

Manually 78.1±4.2 74.0±4.8 61.5±1.1

Automatic 73.4±6.0 69.1±6.0 61.9±4.9

Deformation 75.0±5.2 58.9±2.9 56.2±5.0

Velocity 76.6±4.3 59.3±4.1 57.7±5.1

pression data are presented in Table 10. It can be

seen that LDA classiﬁer achieves the highest recogni-

tion rate for every facial expression representation. As

shown in Table 10 all facial expression representation

achieve a similar recognition rate for the same clas-

siﬁer with the highest rate for the manually selected

landmarks. The manually selected landmarks are in-

cluded only for a reference for other automatic meth-

ods. The recognition rates obtained by the automatic

methods are lower (maximum 4% less for deforma-

tion ﬁeld based representation) than that obtained by

manual landmark selection.

The confusion matrices for LDA for different data

are given in Tables 5 - 8. From the classiﬁcation per-

formance, it can be concluded that the surprise, dis-

gust, happiness and sadness expressions can be clas-

siﬁed in most cases with above 75% accuracy, anger

with about 70% accuracy, whereas fear is only clas-

siﬁed correctly in 58% . The best recognition rates

(about 90%) are found for surprise, similarly as for

work in (Quan et al., 2009) for data sets taken the

same database.

The results of misclassiﬁcation support the con-

clusion of the separability analysis conducted in the

previous section. The pair of expressions with low

value of separability criterion J

ex

i

,ex

j

2

(~x) are more

prompt to be misclassiﬁed (e g fear and sadness). The

expression of fear achieves low values of separabil-

ity criterion J

ex

i

,ex

j

2

(~x) for each facial expression rep-

resentation and as it is expected the misclassiﬁcation

error is the highest. The expressions with high value

of separability criterion J

ex

i

,ex

j

2

(~x) achieve recognition

rates (e g happiness, or surprise ).

Table 9 summaries the diagonal of Tables 5 - 8.

Taking into account the ”subjective” nature of the

ground truth data (Quan et al., 2009), the results can

be considered as reasonable.

4 CONCLUSIONS

A statistical analysis of different facial expression

representations based on the Log-Euclidean statis-

tics has been presented in this paper. The proposed

method generates ﬁrst the mean face by simultaneous

registration of faces with neutral expression included

in the training data set, thereby enabling all faces to

be mapped to the common face space based on the

estimated transformations. The obtained results show

that the Space Shape Vectors built based on the veloc-

ity ﬁelds can be consider as an effective facial expres-

sion representation for the Statistical Shape Model.

The performed tests show also that the parameterisa-

tion via stationary velocity ﬁelds in Log-Domain pro-

duces slightly higher recognition rate of facial expres-

sions that produced by using deformation ﬁelds.

ACKNOWLEDGEMENTS

The work has been in part supported by the MEGU-

RATH project (EPSRC project No. EP/D077540/1).

REFERENCES

Arsigny, V., Commowick, O., Pennec, X., and Ayache,

N. (2006). A log-euclidean framework for statistics

on diffeomorphisms. Medical Image Computing and

Computer Assisted Intervention, 9(Pt 1):924–931.

Ashburner, J. (2007). A fast diffeomorphic image registra-

tion algorithm. NeuroImage, 38(1):95–113.

Bartlett, M. S., Littlewort, G., Fasel, I., and Movellan, J. R.

(2003). Real time face detection and facial expression

recognition: Development and applications to human

computer interaction. In In CVPR Workshop on CVPR

for HCI.

Bishop, C. M. (2006). Pattern Recognition and Ma-

chine Learning (Information Science and Statistics).

Springer-Verlag New York, Inc., Secaucus, NJ, USA.

Bossa, M., Hernandez, M., and Olmos, S. (2007). Contribu-

tions to 3d diffeomorphic atlas estimation: application

to brain images. Med Image Comput Comput Assist

Interv, 10(Pt 1):667–674.

Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham,

J. (1995). Active shape models - their training and

application. Computer Vision. Image Understanding,

61:38–59.

Fasel, B. and Luettin, J. (2003). Automatic facial expres-

sion analysis: A survey. PATTERN RECOGNITION,

36(1):259–275.

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

358

Geng, X., Christensen, G. E., Gu, H., Ross, T. J., and Yang,

Y. (2009). Implicit reference-based group-wise image

registration and its application to structural and func-

tional mri. Neuroimage, 47(4):1341–1351.

Han, X., Hibbard, L. S., and Willcut, V. (2010). An ef-

ﬁcient inverse-consistent diffeomorphic image regis-

tration method for prostate adaptive radiotherapy. In

Proceedings of the 2010 international conference on

Prostate cancer imaging: computer-aided diagnosis,

prognosis, and intervention, MICCAI’10, pages 34–

41, Berlin, Heidelberg. Springer-Verlag.

Hsieh, C.-K., Lai, S.-H., and Chen, Y.-C. (2010). An optical

ﬂow-based approach to robust face recognition under

expression variations. IEEE Transactions on Image

Processing, 19(1):233–240.

Kobayashi, H. and Hara, F. (1997). Facial interaction be-

tween animated 3d face robot and human beings. In

Proc. IEEE Int Systems, Man, and Cybernetics Com-

putational Cybernetics and Simulation. Conf, vol-

ume 4, pages 3732–3737.

Matuszewski, B. J., Quan, W., and Shark, L.-K. (2011).

Biometrics - Unique and Diverse Applications in Na-

ture, Science, and Technology, chapter Facial Expres-

sion Recognition. InTech.

Pantic, M., Member, S., and Rothkrantz, L. J. M. (2000).

Automatic analysis of facial expressions: The state of

the art. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 22:1424–1445.

Papiez, B. W. and Matuszewski, B. J. (2011). Direct in-

verse deformation ﬁeld approach to pelvic-area sym-

metric image registration. In Proceedings Medical Im-

age Understanding and Analysis (MIUA’2011).

Quan, W., Matuszewski, B. J., and Shark, L.-K. (2010).

Improved 3-d facial representation through statistical

shape model. In 17th IEEE International Conference

on Image Processing (ICIP 2010), pages 2433–2436.

Quan, W., Matuszewski, B. J., Shark, L.-K., and Ait-

Boudaoud, D. (2007a). 3-d facial expression represen-

tation using b-spline statistical shape model. In Pro-

ceedings of the Vision, Video and Graphics Workshop.

Quan, W., Matuszewski, B. J., Shark, L.-K., and Ait-

Boudaoud, D. (2007b). Low dimensional surface pa-

rameterisation with applications in biometrics. In Pro-

ceedings of the International Conference on Medical

Information Visualisation - BioMedical Visualisation,

pages 15–22, Washington, DC, USA. IEEE Computer

Society.

Quan, W., Matuszewski, B. J., Shark, L.-K., and Ait-

Boudaoud, D. (2009). Facial expression biometrics

using statistical shape models. EURASIP Journal on

Advances in Signal Processing, 2009:15:4–15:4.

Shan, C., Gong, S., and McOwan, P. W. (2005). Robust fa-

cial expression recognition using local binary patterns.

In ICIP (2), pages 370–373.

Tian, Y.-L., Kanade, T., and Cohn, J. F. (2011). Handbook

of Face Recognition, chapter Facial Expression Anal-

ysis. Springer.

Vercauteren, T., Pennec, X., Perchant, A., and Ayache, N.

(2008). Symmetric log-domain diffeomorphic regis-

tration: a demons-based approach. Medical Image

Computing and Computer Assisted Intervention, 11(Pt

1):754–761.

Vercauteren, T., Pennec, X., Perchant, A., and Ayache,

N. (2009). Diffeomorphic demons: Efﬁcient non-

parametric image registration. NeuroImage, 45(1,

Supp.1):S61–S72.

Wang, J. and Yin, L. (2007). Static topographic modeling

for facial expression recognition and analysis. Com-

puter Vision and Image Understanding, 108(1-2):19–

34.

Yin, L., Wei, X., Sun, Y., Wang, J., and Rosato, M. J.

(2006). A 3d facial expression database for facial

behavior research. In Proc. 7th Int. Conf. Automatic

Face and Gesture Recognition FGR 2006, pages 211–

216.

FACIAL EXPRESSION RECOGNITION USING LOG-EUCLIDEAN STATISTICAL SHAPE MODELS

359