SMILE DETECTION USING LOCAL BINARY PATTERNS AND
SUPPORT VECTOR MACHINES
D. Freire, M. Castrill
´
on
SIANI, Universidad de Las Palmas de Gran Canaria, Spain
O. D
´
eniz
SIANI, Universidad de Las Palmas de Gran Canaria, Spain
Keywords:
Facial analysis, SVM, K-NN, PCA, LBP.
Abstract:
Facial expression recognition has been the subject of much research in the last years within the Computer
Vision community. The detection of smiles, however, has received less attention. Its distinctive configura-
tion may pose less problem than other, at times subtle, expressions. On the other hand, smiles can still be
very useful as a measure of happiness, enjoyment or even approval. Geometrical or local-based detection
approaches like the use of lip edges may not be robust enough and thus researchers have focused on applying
machine learning to appearance-based descriptors. This work makes an extensive experimental study of smile
detection testing the Local Binary Patterns (LBP) as main descriptors of the image, along with the powerful
Support Vector Machines classifier. The results show that error rates can be acceptable, although there is still
room for improvement.
1 INTRODUCTION
It is now known that emotions play a significant role
in human decision making processes. Some facial ex-
pressions can be very difficult to recognize even be-
tween humans. Besides, in human-computer interac-
tion the range of expressions displayed is typically
reduced. In front of a computer, for example, the
subjects rarely display accentuated surprise or anger
expressions as he/she could display when interacting
with another human subject.
The human smile is a distinct facial configuration
that could be recognized by a computer with greater
precision and robustness. Besides, it is a significantly
useful facial expression, as it allows to sense happi-
ness or enjoyment and even approval (and also the
lack of them) (Ekman et al., 1990). As opposed to fa-
cial expression recognition, smile detection research
has produced less literature. The lip zone is obvi-
ously the most important, since human smiles involve
mainly the Zygomatic muscle pair, which raises the
mouth ends. Edge features alone, however, may be
insufficient.
This paper makes an extensive experimental study
of the smile detection problem, being organized as
follows. Section 2 describes the codification algo-
rithms. The different classification approaches are
briefly presented in Section 3. The experimental re-
sults and conclusions are described in Sections 4 and
5 respectively.
2 REPRESENTATION
The Local Binary Pattern (LBP) is an image descrip-
tor commonly used for classification and retrieval. In-
troduced by Ojala et al. (Ojala et al., 2002) for texture
classification, they are characterized by invariance to
monotonic changes in illumination and low process-
ing cost.
Given a pixel, the LBP operator thresholds the cir-
cular neighborhood within a distance by the pixel gray
value, and labels the center pixel considering the re-
sult as a binary pattern.
Rotation invariance is achieved in the LBP based
representation considering the local binary pattern as
circular. The experience achieved by Ojala et al.
(Ojala et al., 2002) suggested that just a particular
subset of local binary patterns is typically present in
most of the pixels contained in real images. They re-
398
Freire D., Castrillón M. and Déniz O. (2009).
SMILE DETECTION USING LOCAL BINARY PATTERNS AND SUPPORT VECTOR MACHINES.
In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, pages 398-401
DOI: 10.5220/0001792303980401
Copyright
c
SciTePress
Figure 1: The basic version of the Local Binary Pattern
computation (c) and the Simplified LBP codification (d).
fer to these patterns as uniform. Uniform patterns are
characterized by the fact that they contain, at most,
two bitwise transitions from 0 to 1 or viceversa. In the
experiments carried out by Ojala et al. (Ojala et al.,
2002) with texture images, uniform patterns account
for a bit less than 90% of all patterns when using the
3x3 neighborhood.
Using LBP as a preprocessing method has the ef-
fect of emphasizing edges and noise. To reduce the
noise influence, Qian Tao et al. (Tao and Veldhuis,
2007) recently proposed a modification of the LBP
approach. Instead of weighting the neighbors differ-
ently, their weights are all the same, obtaining the
so called Simplified LBPs, see Figure 1-d. Their
approach has shown some benefits applied to facial
verification, due to the fact that by simplifying the
weights, the image becomes more robust to illumina-
tion changes, having a maximum of nine different val-
ues per pixel. The total number of local patterns are
largely reduced so the image has a more constrained
value domain.
In the experiments described at Section 4, both ap-
proaches will be adopted, i.e. using the histogram
based approach, but also using Uniform LBP and
Simplified LBP as a preprocessing step.
On the other hand, another problem to solve is that
raw face images are highly dimensional. A classical
technique applied for face representation to avoid the
consequent processing overload problem is Principal
Components Analysis (PCA) decomposition (Kirby
and Sirovich, 1990). PCA decomposition is a method
that reduces data dimensionality, without a significant
loss of information, by performing a covariance anal-
ysis between factors. As such, it is suitable for highly
dimensional data sets, such as face images. A normal-
ized image of the target object, i.e. a face, is projected
in the PCA space. The appearance of the different in-
dividuals is then represented in a space of lower di-
mensionality by means of a number of those resulting
coefficients, v
i
.
3 CLASSIFICATION
As mentioned above, LBP histograms have been used
as features to characterize images. For facial anal-
ysis, the face image is typically divided into blocks,
describing the whole face by means of the histogram
collection extracted from those rectangular blocks.
The L1 norm is often used as a simple and fast ap-
proach to compute the similarity between two his-
tograms with n bins. In the experiments reported in
Section 4, the block size has been ajusted to the size
of the image part to study depending on the kind of
test.
Support vector machines (SVM) are a set of re-
lated supervised learning methods used for classifi-
cation and regression. They belong to a family of
generalized linear classifiers. A property of SVMs is
that they simultaneously minimize the empirical clas-
sification error and maximize the geometric margin;
hence they are also known as maximum margin clas-
sifiers. LIBSVM (Chang and Lin, 2001) has been the
library employed in the experiment described below.
Another supervised learning method used in this
paper is the k-nearest neighbor (k-NN). It is a method
for classifying objects based on closest training exam-
ples in the feature space. k-NN is a type of instance-
based learning, or lazy learning where the function is
only approximated locally and all computation is de-
ferred until classification.
4 EXPERIMENTS
The dataset of images used for the experimental setup
is separated into two classes: smiling and not smiling.
The previous classification has been performed by hu-
mans who labeled each normalized image of 59 × 65
pixels. The first set contains 2421 images of different
smiling faces, while the second set contains 3360 non
smiling faces.
As briefly mentioned above, the experimental
setup considers two different possibilities as input: 1)
the whole normalized face image, and 2) the image
parts extracted from the face image corresponding to
both eyes and the mouth, which are the most impor-
tant facial regions to detect the human smile in psy-
chology (Ekman et al., 1990).
On this second possibility, there are two different
kinds of test depending of how the ocular area is ex-
tracted. The eyes can be considered separately or not.
This fact is important because the amount of informa-
tion that can be considered for each test is different
depending on the eye extraction.
The input image will be a grayscale image, and for
SMILE DETECTION USING LOCAL BINARY PATTERNS AND SUPPORT VECTOR MACHINES
399
Table 1: Error rate with k-NN based approaches considering
image parts. There is an entry for each experiment. For each
slot the name is composed by the abbreviated name of the
approach, of the parts involved in the experiment and a k-
NN classification method. For the preprocessed input data,
there are three possibilites: LBPU, SLBP and GRAY, which
stands for Uniform LBP, Simplified LBP and Grayscale Im-
age respectively. For the image part, there are also three
possibilities: M, SE and ET, which stands for a Mouth, Sep-
arated Eyes and Eyes Together respectively. And for the
classification methods there are two possibilities: 1NN and
5NN which stands for 1-NN and 5-NN respectively.
KNN Test Image Values Histogram
GRAY M 1NN 20.3% 37.5%
ULBP M 1NN 41.0% 32.0%
SLBP M 1NN 35.0% 36.5%
GRAY M 5NN 18.7% 34.5%
ULBP M 5NN 43.0% 28.5%
SLBP M 5NN 38.5% 32.0%
GRAY ET+M 1NN 26.0% 31.0%
ULBP ET+M 1NN 44.0% 32.5%
SLBP ET+M 1NN 42.0% 34.5%
GRAY ET+M 5NN 23.5% 30.0%
ULBP ET+M 5NN 43.0% 27.5%
SLBP ET+M 5NN 38.0% 33.0%
GRAY SE+M 1NN 24.0% 33.0%
ULBP SE+M 1NN 46.0% 33.5%
SLBP SE+M 1NN 42.0% 34.0%
GRAY SE+M 5NN 21.0% 30.0%
ULBP SE+M 5NN 42.0% 30.0%
SLBP SE+M 5NN 39.0% 33.5%
representation purposes we have used the following
approaches for the whole image and the image parts
tests:
A PCA space obtained from the original gray
images or obtained after preprocessing the orig-
inal images using LBP. Two different approaches,
i.e. Simplified LBP (SLBP) and Uniform LBP
(ULBP), have been used.
A concatenation of histograms based on the gray
image or the resulting LBP image (both ap-
proaches simplified and uniform were used).
A concatenation of the image values based on the
gray images or the resulting LBP image (again
both LBP approaches were used).
Similar experimental conditions have been used
for every approach considered in this setup. The test
sets are built randomly, having an identical number of
images for both classes. The results presented corre-
spond to the percentage of wrong classified samples
of all test samples.
The average results presented on this paper are
achieved for each configuration after ten random se-
lections with the 50% of the samples for training and
the 50% for testing. Therefore, for the experiment
we have used for training 2000 images, 1000 of each
class, and 2000 images for test, 1000 of each class.
Using the different input sources mentioned
above, for the k-NN image parts experiments, each
image part (grayscale or LBP preprocessed) is trans-
Table 2: Error rate with SVM based approach considering
the image parts. There is an entry for each experiment. For
each slot the name is composed by the abbreviated name of
the approach and of the parts involved in the experiment.
SVM Test Image Val. Hist. PCA 40 PCA 90 PCA 130
GRAY M 13.3% 37.8% 10.3% 11.4% 11.6%
GRAY ET+M 12.5% 30.8% 09.7% 10.2% 10.3%
GRAY SE+M 13.2% 30.7% 10.6% 10.9% 11.2%
ULBP M 23.5% 25.3% 27.2% 26.3% 25.1%
ULBP ET+M 32.1% 22.3% 29.9% 36.6% 36.0%
ULBP SE+M 31.2% 22.1% 29.0% 32.2% 30.1%
SLBP M 16.4% 27.7% 20.8% 18.0% 18.3%
SLBP ET+M 22.2% 25.3% 21.5% 19.9% 19.8%
SLBP SE+M 21.5% 25.0% 21.4% 21.5% 24.3%
Table 3: Error rate with k-NN considering the Whole Im-
age. For each slot the name is composed by the abbreviated
name of the approach and a k-NN classification method.
k-NN Test Image Values Histogram
GRAY 1-NN 27.0% 43.0%
ULBP 1-NN 45.0% 43.5%
SLBP 1-NN 43.0% 43.0%
GRAY 5-NN 29.5% 41.5%
ULBP 5-NN 46.3% 41.0%
SLBP 5-NN 38.4% 46.0%
formed to a collection of normalized features. These
features can be the bins of a normalized histogram or
the image values, depending on the approach used.
Table 1 shows the error rates for each approach
varying the image parts considered to represent the
face. As it was expected, for the image values test,
the grayscale image based approach results are clearly
better. In this case the test is performed directly using
the normalized graylevels vector of the image parts.
The test for both LBP preprocessed images report
similar rates, and too far from the grayscale images
rates. This fact is due to the data compression that
both LBP approaches introduce.
For the normalized histogram test with image
parts, the results achieved for the grayscale images
are worse than those rates achieved for the image val-
ues test, although the test for both LBP preprocessed
images achieved better rates, especially the Simplified
LBP. Again, both LBP results seem quite similar.
Another important aspect to take care in this sec-
tion is the variation of behavior between 1-NN and
5-NN tests. 5-NN reported better results than the 1-
NN, because 5-NN works with more information than
1-NN does, and this fact affects the error rate.
Table 3 shows the results obtained by analyz-
ing the whole image. It is observable that the rates
achieved with the image values test are much better
than those obtained for the normalized histogram test.
This fact is justified by the presence of more informa-
tion in the image values based representation to take
the best decision. Another point to consider in this
case, is that the rates are worse than those achieved
for the image parts results. These results suggest that
VISAPP 2009 - International Conference on Computer Vision Theory and Applications
400
Table 4: Error rate with SVM considering the Whole Image.
SVM Test Image Val. Hist. PCA 40 PCA 90 PCA 130
Grayscale image 10.6% 42.9% 11.3% 10.4% 10.1%
Uniform LBP 27.2% 35.7% 25.9% 27.6% 27.0%
Simplified LBP 26.2% 38.2% 27.5% 27.7% 28.8%
just analyzing eyes and mouth, there is an increase
of the amount of useful information to take the most
suitable decision.
For the next part of the experiments we have used
SVM as classification criteria. It must be specified
that, for the SVM test, there are ve possible repre-
sentation approaches: image values, normalized his-
tograms, PCA 40, PCA 90 and PCA 130. The number
next to PCA refers to the dimension of the representa-
tion space, i.e. it indicates the number of eigenvectors
used for projecting the face image.
As it can be seen on Tables 2 and 4 the error rates
achieved using the grayscale images are the best rates
in almost every situation. None of the LBP based rep-
resentations outperforms that approach. However, the
Uniform LBP approach evidences a larger improve-
ment when normalized histograms are used. Simpli-
fied LBP approach reported better results than Uni-
form LBP except for the normalized histograms.
Restricting the input data to the most expressive
facial areas, again the Grayscale approach presents
the best performance with a clear improvement above
the k-NN tests rates. This effect suggests that focus-
ing on those areas, some noise is removed in the rep-
resentation information, situation not observed with
k-NN, and this allows the use of powerful classifiers.
When the normalized image values vector based
representation is used, the grayscale image test
achieved the lowest error rate.
For the PCA rates, the behaviour of those error
rates is quite similar to the behaviour obtained previ-
ously with the rates of the image values test. Again
grayscale image test achieved the lowest error rates.
When the histogram based representation is used,
the Uniform LBP error rate is the lowest. This ap-
proach seems to model properly the smile texture
even when the histogram is losing the relative location
information. However, this feature is quite similar
for the Simplified LBP approach, its histogram loses
information but the rates achieved are similar. The
grayscale image test achieved its highest error rate in
this case, higher than the Uniform LBP and Simplified
LBP approaches, which means that the Grayscale ap-
proach is very sensitive to the relative location of the
information.
For the SVM setting already explained, strategical
blocks of eyes and mouth for image parts are trans-
lated into a reduction of the number of dimensions.
The improvement of image parts is due to this fact.
Of course, it should be mentioned that PCA reduces
dimensions too, that is the reason why the grayscale
image test with PCA achieved better results than the
image values test.
5 CONCLUSIONS
This paper described a smile detection using differ-
ent LBP approaches, as well as grayscale image rep-
resentation, combined with two different classifica-
tion methods: k-NN and SVM. It has been shown
the potentiality of Simplified LBP as a preprocessing
method for smile verification, specially in SVM test.
Observing in detail the differences achieved be-
tween the Uniform LBP and the Simplified LBP. The
distribution of Simplified LBP can be used as a good
representation for images with more or less uniform
textures, but for the face image it is not enough.
Therefore, if the area is restricted to just the mouth,
the performance increases, due to the fact that other
textures are removed, even if it loses connection be-
tween patterns and their relative position in face.
In this paper we have developed a static smile
clasiffier achieving, in some cases, a 90% of success
rate. Smile detection in video streams, where tempo-
ral coherence is implicit, will be studied, as a cue to
get the ability to recognize the dynamics of the smile
expression.
REFERENCES
Chang, C.-C. and Lin, C.-J. (2001). LIBSVM: a library
for support vector machines. Software available at
http://www.csie.ntu.edu.tw/ cjlin/ libsvm.
Ekman, P., Davidson, R., and Friesen, W. (1990). The.
duchenne smile: emotional expression and brain phys-
iology. ii. Journal of Personality and Social Psychol-
ogy.
Kirby, Y. and Sirovich, L. (1990). Application of the
Karhunen-Lo
´
eve procedure for the characterization of
human faces. IEEE Trans. on Pattern Analysis and
Machine Intelligence, 12(1).
Ojala, T., Pietikinen, M., and Menp, T. (2002). Multiresolu-
tion gray-scale and rotation invariant texture classifi-
cation with local binary patterns. IEEE Trans. on Pat-
tern Analysis and Machine Intelligence, 24(7):971–
987.
Tao, Q. and Veldhuis, R. (2007). Illumination normaliza-
tion based on simplified local binary patterns for a face
verification system. In Proc. of the Biometrics Sympo-
sium, pages 1–6.
SMILE DETECTION USING LOCAL BINARY PATTERNS AND SUPPORT VECTOR MACHINES
401