the best feature set for our task, denoted in the remain-
der by F
selection
. In some cases many feature sets are
good candidate for the feature selection. More details
are presented in section 6.1.
5.3 Evaluation of Emotion Recognition
Performance
Once the number of features is chosen, FI is com-
puted on S
train
+ S
evaluation
. Then, only the F
selection
most important features are selected. SVM is trained
with the selected features on both S
train
and S
evaluation
.
The performance is finally evaluated with F
selection
on
S
test
.
6 EXPERIMENTAL RESULTS
Our method is trained and evaluated on the Cohn-
Kanade database. This database (Kanade et al., 2000)
is largely used in automatic facial expression recogni-
tion. It includes 97 posers between the ages of eigh-
teen and thirty; 65% are female, 15% are African-
American, and 3% are Asian or Latino. They present
the six basic emotions, namely: anger (Ang), disgust
(Dis), surprise (Sur), happiness (Hap), fear (Fea) and
sadness (Sad). The last frame of each sequence ex-
pressing the required emotion is coded using the Fa-
cial Action Coding System (FACS). The first image
always presents the neutral expression (Neut). In this
paper, we use these images to train our approach on
the neutral expression.
6.1 RF based Feature Selection
In this section, feature selection results on the Cohn-
Kanade database are described . At first, features are
iteratively reduced and SVM recognition rates and RF
error rates are stored. Figure 4 presents respectively
the behaviour of SVM recognition rates and RF error
rates for different feature selection sets. We remark
that when the selected features number exceeds 1200
the recognition rate is very close to the one obtained
with the whole features. In the meanwhile the RF er-
ror rate is slightly reduced comparing to RF error rate
before feature selection. Feature sets from 1200 se-
lected features to 3000 selected features seems good
candidates to construct the final model.
We tested four feature sets F
selection
∈
{1200, 1500, 1800, 2500}. As mentioned pre-
viously, we compute FI on both sets S
train
and
S
evaluation
. First, all features are ranked and only
the most important features are chosen (F
selection
).
Finally, SVM is trained with the selected features.
Once the model is created, the recognition rate is
computed on the test set. Table 1 presents the four fea-
ture set recognition rates (RR). We notice that feature
sets 1500, 1800 and 2500 enhance emotion recogni-
tion rates respectively by 1.2%, 2% and 2.4%. Fea-
ture set containing 1200 features leads to a recogni-
tion rate which is about the same as the whole one.
While 1800 and 2500 feature sets allow an increase
of the performance, they both remain slightly differ-
ent. We thus chose for the remainder of the paper
1800 features (F
selection
= 1800).
Table 2 and table 3 present respectively the confu-
sion matrix before and the confusion matrix after fea-
ture selection. Labels are presented on rows and the
predicted emotions are presented on columns. The
comparison between both matrices reveals that re-
moving features mainly allows to decrease the confu-
sion between sadness and the neutral expression. In-
deed, sadness recognition rate is increased by about
22%, while the other emotions keep the same recog-
nition rate except disgust which decreases by 8%.
6.2 Comparison with Principal
Component Analysis
Principal component analysis (PCA) is a statistical
approach, often used in feature reduction. It trans-
forms feature space to an uncorrelated one. Principal
componentswhich form the newfeature space, are the
linear combination of the original features.
In this section, we use PCA to select a subspace of
the original features by thresholding feature weights
computed by the method and stored in the transfor-
mation matrix. This approach is used in (Chuang
and Wu, 2004) to select acoustic features for emotion
recognition speech.
PCA is firstly applied to create the principal com-
ponents on S
train
+ S
evaluation
. Various principal com-
ponent sets are chosen to capture different amount of
data variance {85%, 90%, 95%, 97%}. A threshold
set, from −0.01 to 0.04 in 0.005 steps, is also tested
to obtain the best recognition rate with the lowest fea-
ture set. After choosing 95% of the variance and a
threshold of 0.025 on feature weights, SVM is finally
applied on S
test
. The recognition rate is about 95.2%
for 3061 selected features.
The comparison between emotion recognition
rates before (see table 2) and after PCA feature reduc-
tion (see table 4) reveals that PCA feature reduction
enhances sadness recognition by only 13.8% (com-
pared with 22% for RF based feature selection) but
keeps a better recognition rate for disgust.
The main advantage of RF based feature selection
is that it selects less features (1800 features) than PCA