ent temporal and morphological characteristics than
posed ones.
The purpose of our work is to demonstrate that
sparse representation is an efficient model in order to
classify and to increase the accuracy rate of predict-
ing the spontaneous facial expressions using sponta-
neous facial images. Sparse representation provides
higher or lower dimensional representations which in-
duce the likelihood that image classes will be possi-
bly linearly separable. The sparse discriminative fea-
ture set provides the main interface through which a
machine learning algorithm can infer about the data.
More precisely, the main issue with sparse represen-
tation being dictionary learning and due to the fact
that the original facial image has a very high dimen-
sion, the straightforward application of sparse repre-
sentation for sparse feature extraction from raw im-
ages does not lead to a meaningful sparse representa-
tion. Thus we present an efficient initialization strat-
egy and dimensionality reduction technique via de-
veloping an optimized random face feature descriptor
(RFFD) based on the random projection (RP) concept
(Vempala, 2005). RFFD aims at projecting the fa-
cial images into a lower dimensional space and at se-
lecting the most discriminative feature sets that min-
imizes the correlation between different facial image
classes while maximizing the correlation within fa-
cial image classes, in an attempt to ensure the unique-
ness of the atoms selection from the dictionary during
sparse coding process. Our pre-training step allows us
to avoid high computational resources (memory usage
and training time) required during dictionary training
which is an important requirement for developing a
real-time automatic facial expression recognition sys-
tem. Experimental results on the JAFFE acted facial
expression database and on the DynEmo spontaneous
expression database demonstrate that our algorithm
outperforms many recently proposed sparse represen-
tation and dictionary learning based approaches. Our
algorithm has the capacity to be trained on a small or a
big dataset and to provide a high accuracy rate, which
can be considered as an advantage compared to deep
learning approaches which are doing great nowadays
only if a big dataset is provided.
2 RELATED WORK
Numerous methods for extracting discriminative in-
formation about facial expressions from images have
been developed. For example, Eigenfaces, Fisher-
faces, and Laplacianfaces have been used on full face
images (Buciu and Pitas, 2004). Gabor filter banks
also have been successfully used as an efficient facial
feature ((Candes and Romberg, 2005) and (Cand
`
es
et al., 2006)) because these features are locally con-
centrated and have been shown to be robust to block
occlusion (Donoho, 2006). Once the feature vector
is extracted from an image, this vector feeds a classi-
fier which gives the recognized expression. A survey
of automatic facial expression recognition methods is
presented in (Hoyer, 2003).
A noteworthy contribution of sparse representa-
tions of signals has been reported in recent years. It
has been successfully applied to a variety of prob-
lems in computer vision and image analysis, includ-
ing image denoising (Elad and Aharon, 2006), image
restoration (Mairal et al., 2008) and image classifi-
cation (Yang et al., 2009), (Wright et al., 2009) and
(Bradley and Bagnell, 2008). Sparse representation
modeling of data assumes an ability to describe sig-
nals as linear combinations of few atoms from a pre-
specified dictionary. The success of the model relies
on the quality of the dictionary that sparsifies the sig-
nals. The choice of a proper dictionary can be done
using one of two following ways (Rubinstein et al.,
2010): building a sparsifying dictionary based on a
mathematical model of the data (wavelets, wavelet
packets, contourlets, and curvelets), or learning a dic-
tionary to perform best on a training set. Reference
(Wright et al., 2009) employs the entire set of train-
ing samples as the dictionary for discriminative sparse
coding, and achieves impressive performance for face
recognition. Many algorithms ((Mairal et al., 2010)
and (Wang et al., 2010)) have been proposed to ef-
ficiently learn an over-complete dictionary (the num-
ber of prototype signals, referred as atoms, is much
greater than the features size) that enforces some dis-
criminative criteria. Recently, another sparse rep-
resentation for object representation and recognition
was proposed in the seminal work (Wright et al.,
2009). In (Jiang et al., 2013), the class labels of
training data are used to learn a discriminative dic-
tionary for sparse coding. In addition, label informa-
tion is associated with each dictionary item to enforce
discriminability in sparse codes during the dictionary
learning process. More specifically, a new label con-
sistency constraint called “discriminative sparse-code
error” is introduced and combined with the recon-
struction error and the classification error to form a
unified objective function.
Our work is inspired by the good reputation of
sparse representation in both theoretical research and
practical applications ((Yang et al., 2009), (Wright
et al., 2009), (Bradley and Bagnell, 2008) and (Mairal
et al., 2008)). Moreover, our choice comes from the
fact that sparse representation has the ability to pro-
vide sparse vectors that can share the same sparsity
Spontaneous Facial Expression Recognition using Sparse Representation
65