Various Fusion Schemes to Recognize Simulated and Spontaneous

Emotions

Sonia Gharsalli

, H

ene Laurent

, Bruno Emile

and Xavier Desquesnes

Univ. Orl

eans, INSA CVL,

PRISME EA 4229, Bourges, France

on secondment from INSA CVL, Univ. Orl

eans,

PRISME EA 4229, Bourges, France

to the Rector of the Academy of Strasbourg, Strasbourg, France

Keywords:

Facial Emotion Recognition, Posed Expression, Spontaneous Expression, Early Fusion, Late Fusion, SVM,

FEEDTUM Database, CK+ Database.

Abstract:

This paper investigates the performance of combining geometric features and appearance features with various

fusion strategies in a facial emotion recognition application. Geometric features are extracted by a distance-

based method; appearance features are extracted by a set of Gabor ﬁlters. Various fusion methods are proposed

from two principal classes namely early fusion and late fusion. The former combines features in the feature

space, the latter fuses both feature types in the decision space by a statistical rule or a classiﬁcation method.

Distance-based method, Gabor method and hybrid methods are evaluated on simulated (CK+) and sponta-

neous (FEEDTUM) databases. The comparison between methods shows that late fusion methods have better

recognition rates than the early fusion method. Moreover, late fusion methods based on statistical rules per-

form better than the other hybrid methods for simulated emotion recognition. However in the recognition

of spontaneous emotions, the statistical-based methods improve the recognition of positive emotions, while

the classiﬁcation-based method slightly enhances sadness and disgust recognition. A comparison with hybrid

methods from the literature is also made.

1 INTRODUCTION

Automatic facial emotion recognition is a challenging

topic in machine vision research. It has made many

achievements in the last years in various applications

(human/machine interaction, psychiatry, behavioural

science, educational software, animation...).

Automatic facial emotion recognition methods

can be distinguished in two main classes: geomet-

ric methods and appearance-based methods. Geomet-

ric methods detect face components shapes and posi-

tions. Feature points tracking and face motion track-

ers are the mostly used geometric techniques to cap-

ture expression of emotions from image sequences.

Abdat et al (Abdat et al., 2011) represent each facial

muscle motion by distance variation between pair of

feature points. To recognize the six basic facial emo-

tions and a set of Facial Action Units (FAU), Kotsia

et al (Kotsia and Pitas, 2007) compute the displace-

ments of some selected Candide nodes from the ﬁrst

frame to the greatest facial expression intensity frame.

On the other hand, appearance-based methods extract

facial texture changes such as wrinkles and furrows.

These methods use various techniques to capture the

skin texture changes such as Gabor wavelets (Bartlett

et al., 2003), Local Binary Patterns (LBP) (Shan et al.,

2009), optical ﬂow (Anderson and McOwan, 2006).

Both geometric methods and appearance-based

methods have some speciﬁc weaknesses. Kotsia et al

(Kotsia et al., 2008b) report that the use of only tex-

ture information can lead to confusion between anger

and fear emotions. However, the lack of texture in-

formation can lead to the misclassiﬁcation of subtle

facial movements. The combination between these

two classes could then allow to achieve better results.

Fasel et al (Fasel et al., 2002) explain that having an

hybrid method can be of great interest, if the individ-

ual approaches produce very different error patterns.

The choice of the appropriate fusion scheme can

also impact the results. The fusion of information

is generally performed at two levels: feature level

and decision level. For emotion recognition applica-

tions these two levels are highlighted when various

modalities are combined such as: speech and facial

424

Gharsalli S., Laurent H., Emile B. and Desquesnes X..

Various Fusion Schemes to Recognize Simulated and Spontaneous Emotions.

DOI: 10.5220/0005312804240431

In Proceedings of the 10th International Conference on Computer Vision Theory and Applications (VISAPP-2015), pages 424-431

ISBN: 978-989-758-090-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

expressions (Busso et al., 2004), face and body ges-

tures (Gunes and Piccardi, 2005). For facial expres-

sion recognition applications, the combination of dif-

ferent features is generally done by feature level fu-

sion. Kotsia et al (Kotsia et al., 2008b) extract the ap-

pearance features by the Discriminant Non-negative

Matrix Factorization (DNMF) methods. Besides, the

shape is computed by the deformed Candide grid. An

early fusion method is applied to combine between

both descriptors. The same fusion scheme is applied

by Zhang et al (Zhang et al., 2012) and Chen et al

(Chen et al., 2012) to obtain robust combined features

to recognize facial expressions. The geometric fea-

tures are computed through distance-based method in

(Zhang et al., 2012) and displacement-based method

in (Chen et al., 2012). Both methods use in addition

local texture information. Wan et al (Wan and Ag-

garwal, 2013) learn a distance metric structure from

combined features. A feature level fusion is applied

with different weights between texture and geometric

features.

In this paper, various fusion strategies (early fu-

sion, fusion by statistical rules and fusion by classiﬁ-

cation method) are studied and their robustness in the

recognition of posed and spontaneous facial expres-

sions is analysed.

The paper is organised as follows: description of

the features extraction and the fusion strategies is pro-

posed in the next section, followed by the presenta-

tion of the considered databases in section 3. Sec-

tion 4 reports the experimental results on the CK+

database (Lucey et al., 2010) and the FEEDTUM

database (Wallhoff, 2006). A discussion is also pre-

sented there. Conclusion and prospects are given in

section 5.

2 METHODS DESCRIPTION

Emotion recognition systems are based on three steps:

face detection, features extraction and features clas-

siﬁcation. In our work, we chose for real-time face

detector an adapted version of Viola&Jones method

(Viola and Jones, 2001) available in OpenCV (Brad-

ski et al., 2006). In the following section, we present

the methods used to extract facial features.

2.1 Feature Extraction Methods

Existing emotion recognition methods are mainly

based on two types of features, namely geometric

features and appearance features. For geometric fea-

tures, we chose a distance-based method presented in

(Abdat et al., 2011). Due to its face measure model,

Figure 1: Techniques used to detect the three axis. The ﬁrst

row presents the horizontal projection of the horizontal gra-

dient of the whole face, the second row presents the hori-

zontal projection of the vertical gradient of the lower half of

the face. The third row shows the location of the symmetric

axis computed as the horizontal middle of the face.

this method presents a good location of feature points

independently of illumination changes and subjects

changes. Moreover, it works in real time. For appear-

ance features, we chose the Gabor method, a widely

used method for texture extraction on different orien-

tations and different scales.

2.1.1 Distance-based Method

Abdat et al (Abdat et al., 2011) developed a distance-

based method. The facial expression is coded by dis-

tances variation linking the variation of the most rel-

evant muscles to the human expressions. These dis-

tances are computed from a pair of dynamic and ﬁxed

points. The dynamic points are feature points that can

move during the expression located on eyebrows, lips,

eyelid and nose. The ﬁxed points present stable points

with respect to facial expression changes located on

face edges, outer corners of the eyes and the nose root.

The location of these points is based on the detection

of the horizontal position of the eyes, the horizontal

position of the mouth and the facial symmetric axis.

To improve the detection of these three axis, we

changed some of the techniques used in (Abdat et al.,

2011). For the detection of the eyes axis, the horizon-

tal gradient projection is used (see the ﬁrst row of ﬁg-

ure 1). In our case, we use the Sobel mask to compute

the horizontal gradient instead of columns difference.

We also changed the mouth detection technique. In-

stead of using a HSV segmentation, we apply the hor-

izontal projection of the vertical gradient. The second

row of ﬁgure 1 illustrates the mouth axis detection,

while the last row presents the symmetric axis detec-

VariousFusionSchemestoRecognizeSimulatedandSpontaneousEmotions

425

tion which is computed as the horizontal middle of

the face.

To ensure the position of feature points, the

Shi&Tomasi method (Shi and Tomasi, 1994) is used

in a neighbourhood of each point. In our case, we

use a 8X8 block around each detected point. This

method is available in the OpenCv library (Bradski

et al., 2006).

The feature points are localised in the ﬁrst frame

of the image sequence which corresponds to the neu-

tral face. Afterwards, these points are tracked using

the Lucas-Kanade algorithm (Bouguet, 2000).

For each image, we obtain a distance feature vec-

tor composed of 21 distances.

2.1.2 Gabor Method

Gabor ﬁlter-based feature extraction has been suc-

cessfully applied to ﬁngerprint recognition (Lee and

Wang, 1999), face recognition (Vinay and Shreyas,

2006) and facial feature point detection (Vukadinovic

and Pantic, 2005). This is due to its similarity with

the human visual system (Lee and Wang, 1999).

We applied the Gabor method to detect skin

changes in each image. The faces were detected au-

tomatically and normalized to 80 × 60 sub-images

based on the location of the eyes. The face is then

ﬁltered with a ﬁlter bank.

The entire ﬁlter bank can be generated by chang-

ing the orientation and the scale in the “mother” ﬁlter

(1) (Kotsia et al., 2008a)

(z) =

||k||

exp(−

||k||

||z||

2σ

)(exp(ik

z)−exp(

)),

(1)

z = (x, y) refers to the pixel and the wave vector

k presents the vector of the plane wave restricted by

the Gaussian envelope function, its characteristic: k =

cosφ

sinφ

]

with k

= 2

−

v+2

π,φ

= µ

The parameter σ controls the width of the Gaus-

sian

, in our case σ = 2π. The subtraction in the

second term of equation (1) makes the Gabor kernels

DC-free to have quadrature pair (sine/cosine) (Movel-

lan, 2005). Thus, the Gabor process becomes more

similar to the human visual cortex. For our bank, we

use three high frequencies for v=0,1,2 and four orien-

tations 0,

3π

After the convolution of the face image with the

Gabor bank, the face is again downsampled to 20 ×

15. We obtain then a feature vector of 3600 descrip-

tors (20 × 15 × 12).

Geometric feature

vector

Feature vectors

combination

Appearance

feature vector

SVM

Emotion

Figure 2: Early fusion scheme.

2.2 Geometric and Appearance Fusion

Modalities

Geometric techniques and appearance approaches

have their own strengths and limitations. The com-

bination of both features may compensate the limi-

tations of each method. The choice of the adequate

fusion technique is also very important to enhance

the emotion recognition system. Fusion can be done

in the features space (early fusion) or in the decision

space (late fusion). Early fusion combines weighted

or equiprobable feature vectors in the same vector.

Then, a classiﬁcation method is applied. In contrast to

early fusion, late fusion ﬁrstly applies a classiﬁcation

step to each feature vector independently and com-

bines afterwards the obtained probabilities. In this

paper, we studied various fusion methods.

2.2.1 Early Fusion Method

For each face image the geometric feature vector

is extracted by the distance-based method (X

∈R

with d=21 features) and the appearance feature vec-

tor is extracted by the Gabor method (X

∈R

with

d1=3600 features). Both vectors are then normalized

in [0,1] using the Min Max technique (Snelick et al.,

2005). The minimum (des min) and the maximum

(des max) of each descriptor are identiﬁed among all

training vectors.

des norm =

des − des min

des max − des min

(2)

A new feature vector is deﬁned containing informa-

tion from both geometric features and appearance fea-

tures X = [X

]

. The feature vector X composed

of 3621 descriptors is used as input to a linear Sup-

port Vector Machine (SVM). Figure 2 presents early

fusion scheme.

2.2.2 Late Fusion Methods

Just like in the early fusion method the geometric and

appearance feature vectors are ﬁrst extracted for each

face image. Then, a linear SVM classiﬁer is applied

to each feature vector to yield two posterior probabil-

ity vectors P(ω

) and P(ω

) where ω

is the

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

426

Figure 4: Examples of statistical fusion methods.

Geometric feature

vector

SVM

Appearance

feature vector

SVM

Decision

fusion unit

Emotion

Figure 3: Late fusion scheme.

class of the emotion and k ∈ {1, ..., n}, where n is the

number of emotions. Those local decision vectors are

then combined using a decision fusion step to obtain

a ﬁnal decision. This fusion scheme is illustrated in

Figure 3.

We performed various modalities of decision fu-

sion such as mean, product and maximum. A classiﬁ-

cation based-method (Atrey et al., 2010) has also been

applied for this last decision fusion step. The next two

sections are devoted to a more detailed presentation of

the above mentioned decision fusion techniques.

2.2.3 Fusion by Statistical Rule

Various statistical rules exist for late fusion such as

average, product, maximum, weighted majority vot-

ing, rank level (Mironica et al., 2013). We chose the

most suitable techniques for our situation where a pri-

ori probabilities are not available.

Fusion by Average Rule

Under the equal prior assumption, the average of

the obtained probability vectors is computed for each

class. The maximum Mean is then selected as the ﬁ-

nal emotion as presented in the equation below. An

example is shown in ﬁgure 4.

m represents the number of classiﬁcation methods

and X

∈ {X

} is a feature vector. These notations

are used in the remainder of the paper.

Z → ω

(

∑

i=1

P(ω

)) = max

(

∑

i=1

P(ω

)))|k = {1, ..,n},

Fusion by Product Rule

We assume that the joint probabilities distribution

measurements computed by SVM classiﬁers on each

are independent which means:

P(X

|ω

) = P(X

|ω

) × P(X

|ω

)

Under this assumption, the product rule is deﬁned as:

Z → ω

i f (

∏

i=1

P(ω

)) = max

(

∏

i=1

P(ω

)) |k = {1, ..,n},

Thus, the product of the obtained probabilities is com-

puted for each class and the selected emotion is de-

ﬁned by the maximum product. An example illustrat-

ing this rule is presented in ﬁgure 4.

Fusion by Maximum Rule

The emotion is assigned to the maximal probability

obtained in the decision vectors as explained below:

Z → ω

i f max

(P(ω

)) = max

(max

(P(ω

)))

|k = {1, .., n}, i = {1,., m},

An example is presented in ﬁgure 4.

2.2.4 Fusion by Classiﬁcation Methods

Fusion by classiﬁcation methods is mainly used in the

domain of multimedia analysis (Snoek, 2005) (Niaz

and Merialdo, 2013). A ﬁrst learning step is applied

to each feature vector to yield emotion scores, then

these probabilistic scores are integrated to a second

learning step to obtain the ﬁnal emotion as illustrated

in ﬁgure 5.

VariousFusionSchemestoRecognizeSimulatedandSpontaneousEmotions

427

Figure 5: Classiﬁcation-based fusion scheme.

The Support Vector Machine SVM classiﬁer is ap-

plied in both learning steps, since it has many advan-

tages namely low parameter number setting and fast

training.

Two ways of training can be applied to the

classiﬁcation-based fusion methods. The ﬁrst one

uses just one training set which is applied in both

training steps. The second one uses two different

sets to train separately the ﬁrst and the second train-

ing step. In this last case a large set of data must be

available. In this paper, only the ﬁrst way will be ap-

plied due to the reduced number of images available

for each emotion in the considered databases.

3 DATABASES

Evaluation and comparison of these methods require

the use of one or more databases. There are two

types of databases: posed emotion ones and sponta-

neous ones. Posed emotion databases present forced

emotions expressed by actors; while spontaneous

databases present emotions stimulated by viewing

videos. In the latter case, the emotions are often la-

belled according to the expected emotion; even if, in

some cases, the expressed emotions are barely visi-

ble. In this paper, we chose an extended version of

the widely used Cohn-Kanade database as forced ex-

pressions benchmark and the FEEDTUM database as

spontaneous database.

The extended Cohn-Kanade database (CK+) con-

tains facial expression videos from 123 subjects (an

additional 26 subjects compared to Cohn-Kanade

database) (Lucey et al., 2010). A total of 7 expres-

sions are labeled including anger, contempt, disgust,

fear, happy, sadness and surprise. The images pre-

sented in this database are digitalized into 640 × 490

pixels. The sequences vary from the neutral expres-

sion to the peak of the expression.

The FEEDTUM database is part of the European

Union project FGNET (Face and Gesture Recognition

Research Network) (Wallhoff, 2006). It contains face

images and videos of 18 subjects performing the six

basic emotions, stimulated by viewing videos. Each

of them realizes the six emotions and the neutral ex-

pression three times. The images presented in this

database are digitalized into 320 × 240 pixels. In to-

tal, it includes 399 sequences.

4 METHODS EVALUATION

The cross-validation method is a frequently used ap-

proach for performance evaluation. We use ﬁve fold

cross-validation in which the data are randomly split

into subsets of approximately equal size. Each set

contains 20% of each emotion class. One set is cho-

sen as a test set, while the remaining sets form the

training set. After the classiﬁcation step, the test set

is integrated in the training set and a new test set is

considered. This procedure is repeated ﬁve times. An

average classiﬁcation accuracy rate is then computed.

The cross-validation method is used to evaluate

the hybrid methods in both CK+ and FEEDTUM

databases. In the next section, a comparison between

the performance of the fusion methods we chose and

the performance of two hybrid methods presented in

the literature is made.

4.1 Methods Comparison on the CK+

Database

A ﬁve fold cross-validation technique is applied to

evaluate the recognition of the six emotions (anger

(Ang), disgust (Dis), fear (Fea), happy (Hap), sadness

(Sad) and surprise(Surp)) and the neutral expression

(Neu) on the CK+ database.

4.1.1 Results Analysis

According to table 1, the distance-based method and

the Gabor method have a similar mean recognition

rate. The distance-based method achieves a recog-

nition rate of 90.7%, while the recognition rate of

the Gabor method reaches 90.4%. However, they

do not misclassify the same emotion. In the case

of the distance-based method the most misclassiﬁed

emotion is sadness. In the case of appearance-based

method the most misclassiﬁed situation is the neutral

expression. The fusion of both features may correct

these misclassiﬁcations.

The recognition rate of the early fusion method

which combines the geometric and appearance fea-

tures on the feature level, achieves 23% (see row 3

table 1). We notice that the early fusion method gives

worse performance than the distance-based method

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

428

and the Gabor method when they are separately ap-

plied. This is due to the huge dimension of the Ga-

bor vector compared to the geometric vector (21 <<

3600). A feature selection method may be a good so-

lution to improve the recognition rate of the early fu-

sion method.

The late fusion methods based on statistical rules

(average, product and max) are presented respectively

in rows four, ﬁve and six of table 1. The recognition

rates of these fusion methods are very similar. The

three methods recognise very well happiness, sad-

ness and surprise but classify worse fear. This emo-

tion is jointly the third most misclassiﬁed emotion by

the distance-based method and the second most mis-

classiﬁed emotion by the Gabor method. The other

emotions have a good recognition rate because one of

the two methods has a good recognition rate. Thus,

the recognition rates of the statistical fusion methods

which are closely linked to the response of the classi-

ﬁers, are impacted. We conclude that the misclassiﬁ-

cation of the fear emotion by the individual classiﬁers

affects the performance of the statistical fusion meth-

ods. Kuncheva (Kuncheva, 2002) reports that the dif-

ﬁcult parts of the feature space are often the same for

all classiﬁers. We remark that the statistical-based fu-

sion methods improve the recognition rate of the emo-

tions, more speciﬁcally the product-based rule fusion

method. It enhances indeed all emotion recognition

rates except anger which looses 4% compared to the

other statistical-based fusion methods.

Classiﬁcation-based fusion method is presented

in the last row of table 1. The recognition rate of

this method exceeds the recognition rate of the Gabor

method and the distance-based method by approxi-

mately 3%. We notice also that it misclassiﬁed the

neutral expression such as the Gabor method and un-

like the distance-based method which achieves a rate

of 100%. The classiﬁcation-based fusion method has

also a bad recognition rate for fear emotion. On the

other hand, it has a good recognition rate for sad-

ness and surprise. We can thus conclude that as

the statistical-based fusion methods the classiﬁcation-

based method achieves good results when the Ga-

bor method and distance-based method have good

recognition rates. Similarly, the classiﬁcation-based

method misclassiﬁes an emotion when both methods

have bad recognition rates such as for the fear emo-

tion. However, this method is also impacted when

one of the classiﬁers has a bad recognition rate like

for the neutral expression.

The comparison of the different fusion modalities

shows that the late fusion methods prove to be a better

choice than the early fusion in our task.

We notice also that the best recognition rates are

given by the methods based on statistical rules for fu-

sion. This is probably the reason why simple statis-

tical rules continue to be mostly used for fusion ap-

proaches. An additional learning step does not have

necessarily the best effect for emotion recognition ap-

plication.

4.1.2 Comparison with Previous Work

A comparison of the proposed fusion method based

on product rule and two methods of the literature that

combine geometric and appearance features can also

be done. We chose the Chen et al. (Chen et al., 2012)

method which was initially intended to recognize

seven emotions: happy, anger, fears, disgust, sadness,

surprise and contempt using an early fusion technique

to combine features and passing them to a SVM clas-

siﬁer. Kotsia et al. (Kotsia et al., 2008b) present also

an early fusion method with the Median Radial Basis

Function Neural Networks (MRBF NNs) to recognize

six emotions (happy, anger, fears, disgust, sadness,

surprise) and the neutral expression. They evaluate

their method on the Cohn-Kanade database, ﬁrst ver-

sion of the CK+ database. The recognition rates of

both methods are presented in table 2.

The proposed method exceeds recognition rate of

97% while Chen et al (Chen et al., 2012) method and

Kotsia et al (Kotsia et al., 2008b) method only achieve

respectively 95% and 92.3%. We also remark that

the most misclassiﬁed emotion is fear for all methods.

This is probably caused by the difﬁculty to simulate

this emotion.

4.2 Spontaneous Expression

Recognition on the FEEDTUM

Database

According to psychologists, the difference between

posed and spontaneous emotions is quite apparent.

This difference is also highlighted in many com-

puter vision application such as (Bartlett et al., 2006),

(Zeng et al., 2009). To develop a real environment

system, both emotion categories should be handled.

This section is devoted to the evaluation of the previ-

ously considered methods in the recognition of spon-

taneous emotions. To this end, we use the FEEDTUM

database which contains spontaneous and natural ex-

pressions. As expressions were captured under natu-

ral circumstances, head motion can be found in some

sequences.

Table 3 presents the obtained recognition rates of

the distance-based method, the Gabor method and the

different fusion methods. We notice that the recogni-

tion rate of the Gabor method exceeds the recognition

VariousFusionSchemestoRecognizeSimulatedandSpontaneousEmotions

429

Table 1: Fusion methods recognition rates computed by 5 fold cross-validation on the CK+ database.

Methods Recognition rates Hap Ang Fea Dis Sad Surp Neu

Geometric distance-based 90.7 96.0 89.5 87.0 85.3 83.0 93.7 100

Appearance Gabor 90.4 97.7 88.0 83.7 96.0 93.3 98.0 75.7

Early fusion 23.0 14.0 50.6 24.8 31.5 4.0 36.6 0

Fusion based on

statistical rules

Average 97.6 100 100 91.7 95.7 100 100 96.0

Product 97.9 100 96.0 95.7 95.7 100 100 98.0

Max 97.3 100 100 91.7 93.7 100 100 96.0

Fusion based on

classiﬁcation

93 95.7 92 83.7 98 100 100 81.7

Table 2: Performance of two emotion recognition systems from the literature which use appearance and geometric features.

Methods Recognition rates Hap Ang Fea Dis Sad Surp Neu

Chen et al (Chen et al., 2012) 95.0 97.5 92.5 90.0 96.0 93.5 96.5

Kotsia et al (Kotsia et al., 2008b) 92.3 97.5 93.6 84.3 89.5 94.3 95.6 91.3

Table 3: Fusion methods recognition rates computed by 5 fold cross-validation on the FEEDTUM database.

Methods Recognition rates Hap Ang Fea Dis Sad Surp Neu

Geometric distance-based 46.8 75.1 54.4 21.5 10.6 16.6 74.4 76.0

Appearance Gabor 84.2 96.0 89.7 69.7 79.3 73.1 91.5 89.7

Early fusion 19.4 24.2 60.2 13.1 2.22 28.0 0 8.0

Fusion based on

statistical rules

Average 83.3 100 85.5 65.5 75.5 70.8 95.5 89.7

Product 83.9 100 81.5 72.0 77.5 72.8 93.3 89.7

Max 84 98.0 89.5 65.3 75.5 71.1 97.7 89.7

Fusion based on

classiﬁcation

84 94 89.7 67.7 79.5 75.3 91.5 89.7

rate of the distance-based method from about 37%.

For spontaneous expressions, the facial changes are

often not clearly visible. Then, the resulting weak

changes are hardly discernible in term of distances

by the distance-based method. Besides, as mentioned

above, during the expressions a head motion can also

occur. The pretreatment done for the Gabor method

consisting of scaling and normalising the face images

based on the location of the two eyes, removes the

head motion. On the other hand, the head motion af-

fects the performance of the distance-based method.

We notice that the mean recognition rates of late

fusion methods are very similar to the Gabor recog-

nition rate. However, happiness and surprise are en-

hanced by the statistical-based fusion methods. This

is due to the high recognition rates of both distance-

based method and Gabor method in such emotions.

We conclude that the recognition of positive sponta-

neous emotions which are more marked than the neg-

ative ones (fear, disgust...) are enhanced by statistical-

based fusion methods. We remark also a slightly im-

provement in the recognition of the disgust and sad-

ness by the classiﬁcation-based method.

The comparison between the different fusion

methods reveals that the decision level fusion meth-

ods are more reliable than the feature level fusion

ones.

5 CONCLUSION

In this paper, various fusion methods are presented

and developed to recognize posed and spontaneous

facial emotions. Distance-based method and Gabor

method extract respectively geometric features and

appearance features. These features are combined

in different levels (feature level and decision level).

Fusion in the decision space proceeds either by sta-

tistical rules or by classiﬁcation methods. Our test

on the posed database ( CK+ database) reveals that

the statistical-based fusion methods are the most ap-

propriate to recognize a greatly apparent expression.

However on the spontaneous database (FEEDTUM

database), the statistical-based methods enhance the

recognition of the positive emotions. Besides, the

classiﬁcation-based method improves the recognition

of sadness and disgust.

In future works, we intend to minimize the num-

ber of features used by the hybrid methods and en-

hance the recognition of the spontaneous emotions.

REFERENCES

Abdat, F., Maaoui, C., and Pruski, A. (2011). Human-

computer interaction using emotion recognition from

VISAPP2015-InternationalConferenceonComputerVisionTheoryandApplications

430

facial expression. 5th UKSim European Symposium

on Computer Modeling and Simulation (EMS), pages

196–201.

Anderson, K. and McOwan, P. W. (2006). A real-time au-

tomated system for the recognition of human facial

expressions. IEEE Transactions Systems, Man, and

Cybernetics, 36(1):96–105.

Atrey, K., Anwar Hossain, M., El-Saddik, A., and Kankan-

halli, S.-M. (2010). Multimodal fusion for multimedia

analysis: a survey. Multimedia System, pages 345–

379.

Bartlett, M., Littlewort, G., Frank, M., Lainscsek, C., Fasel,

I., and Movellan, J. (2006). Automatic recognition of

facial actions in spontaneous expressions. Journal of

Multimedia, pages 22–35.

Bartlett, M.-S., Gwen, L., Ian, F., and Javier, R.-M. (2003).

Real time face detection and facial expression recog-

nition: Development and applications to human com-

puter interaction. Computer Vision and Pattern Recog-

nition Workshop.

Bouguet, J. (2000). Pyramidal implementation of the lucas

kanade feature tracker. Intel Corporation, Micropro-

cessor Research Labs.

Bradski, G., Darrell, T., Essa, I., Malik, J., Perona, P.,

Sclaroff, S., and Tomasi, C. (2006). http ://source-

forge.net/projects/opencvlibrary/.

Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.-

M., Kazemzadeh, A., S., L., Neumann, U., and

Narayanan, S. (2004). Analysis of emotion recogni-

tion using facial expressions, speech and multimodal

information. 6th International Conference on Multi-

modal Interfaces, pages 205–211.

Chen, J., Chen, D., Gong, Y., Yu, M., Zhang, K., and Wang,

L. (2012). Facial expression recognition using geo-

metric and appearance features. Proceedings of the

4th International Conference on Internet Multimedia

Computing and Service, pages 29–33.

Fasel, I., Bartlett, M., and Movellan, J. (2002). A compari-

son of gabor ﬁlter methods for automatic detection of

facial landmarks. 5th International Conference on au-

tomatic face and gesture recognition, pages 345–350.

Gunes, H. and Piccardi, M. (2005). Affect recognition from

face and body: Early fusion vs. late fusion. IEEE In-

ternational Conference on Systems, Man and Cyber-

netics, 4:3437–3443.

Kotsia, I., Buciu, I., and Pitas, I. (2008a). An analysis of fa-

cial expression recognition under partial facial image

occlusion. Image and Vision Computing, 26(7):1052–

1067.

Kotsia, I. and Pitas, I. (2007). Facial expression recognition

in image sequences using geometric deformation fea-

tures and support vector machines. IEEE Transactions

on Image Processing, 16:172–187.

Kotsia, I., Zafeiriou, S., and Pitas, I. (2008b). Texture

and shape information fusion for facial expression and

facial action unit recognition. Pattern Recognition,

pages 833–851.

Kuncheva, L. I. (2002). A theoretical study on six classi-

ﬁer fusion strategies. IEEE Transactions on Pattern

Analysis and Machine Intelligence, pages 281–286.

Lee, C.-J. and Wang, S.-D. (1999). Fingerprint feature ex-

traction using Gabor ﬁlters. Electronics Letters, pages

288–290.

Lucey, P., Cohn, J., Kanade, T., Saragih, J., Ambadar, Z.,

and Matthews (2010). The extended cohn-kanade

dataset (ck+): A complete dataset for action unit and

emotion- speciﬁed expression. IEEE Computer Vision

and Pattern Recognition Workshops, pages 94–101.

Mironica, I., Ionescu, B., P., K., and Lambert, P. (2013). An

in-depth evaluation of multimodal video genre cate-

gorization. 11th International workshop on content-

based multimedia indexing, pages 11–16.

Movellan, J. (2005). Tutorial on gabor ﬁlters. MPLab Tu-

torials, UCSD MPLab, Tech.

Niaz, U. and Merialdo, B. (2013). Fusion methods for

multi-modal indexing of web data. 14th International

Workshop Image Analysis for Multimedia Interactive

Services, pages 1–4.

Shan, C., Gong, S., and Mcowan, P. W. (2009). Facial ex-

pression recognition based on Local Binary Patterns :

A comprehensive study. Image and Vision Comput-

ing, 27:803–816.

Shi, J. and Tomasi, C. (1994). Good features to track.

IEEE Computer Society Conference on Computer Vi-

sion and Pattern Recognition., pages 593–600.

Snelick, R., Uludag, U., Mink, A., Indovina, M., and Jain,

A. (2005). Large-scale evaluation of multimodal bio-

metric authentication using state-of-the-art systems.

IEEE Transactions on Pattern Analysis and Machine

Intelligence, 27:450 –455.

Snoek, C. G. M. (2005). Early versus late fusion in semantic

video analysis. ACM Multimedia, pages 399–402.

Vinay, K. and Shreyas, B. (2006). Face recognition using

gabor wavelets. 4th Asilomar Conference on Signals,

Systems and Computers, pages 593–597.

Viola, P. and Jones, M. (2001). Robust real-time object de-

tection. In international journal of computer vision.

Vukadinovic, D. and Pantic, M. (2005). Fully automatic fa-

cial feature point detection using gabor feature based

boosted classiﬁers. IEEE Conference of Systems,

Man, and Cybernetics, pages 1692–1698.

Wallhoff, F. (2006). Facial ex-

pressions and emotion database,

http://www.mmk.ei.tum.de/ waf/fgnet/feedtum.html.

Wan, S. and Aggarwal, J. (2013). A scalable metric

learning-based voting method for expression recogni-

tion. 10th IEEE International Conference and Work-

shops on Automatic Face and Gesture Recognition

(FG), pages 1–8.

Zeng, Z., Pantic, M., Roisman, G.-I., and Huang, T.-S.

(2009). A survey of affect recognition methods: Au-

dio, visual, and spontaneous expressions. IEEE trans-

actions on pattern analysis and machine intelligence,

pages 39–58.

Zhang, L., Tjondronegoro, D., and Chandran, V. (2012).

Discovering the best feature extraction and selection

algorithms for spontaneous facial expression recogni-

tion. IEEE International Conference on Multimedia

and Expo, pages 1027–1032.

VariousFusionSchemestoRecognizeSimulatedandSpontaneousEmotions

431