FACIAL ACTION UNIT RECOGNITION AND INFERENCE
FOR FACIAL EXPRESSION ANALYSIS
Yu Li Xue, Xia Mao and Qing Chang
School of Electronic and Information Engineering, Beihang University, Xueyuan Road 37, Haidian District, Beijing, China
Keywords:
Facial Action Unit Recognition, Gabor Wavelet, Optical Flow, Support Vector Machine, Dynamic Bayesian
Network.
Abstract:
Human facial expression is extremely abundant, and can be described by numerous facial action units. Rec-
ognizing facial action units helps catching the inner emotion or intention of human. In this paper, we propose
a novel method for facial action unit recognition and inference. We used Gabor wavelet and optical flow for
feature extraction, and used support vector machine and dynamic bayesian network for classification and in-
ference respectively. We combined the advantages of both global and local feature extraction, recognized the
most discriminant AUs with multiple classifiers to achieve high recognition rate, and then inference the related
AUs. Experiments were conducted on the Cohn-Kanade AU-Coded database. The results demonstrated that
compared to early researches for facial action units recognition, our method is capable of recognizing more
action units and achieved good performance.
1 INTRODUCTION
Human facial expression plays an important role in
human daily communications. It is important to ana-
lyze facial expression in the fields of psychology and
affective computing.
Many researchers have proposed methods for fa-
cial action unit recognition. Pantic et al. founded a
facial expression recognition expert system including
various methods to recognize 16 action units and 6 ba-
sic facial expressions in both frontal and profile views
(Pantic and Rothkrantz, 2000). Tian et al. extracted
both permanent and temporal facial features and rec-
ognized neutral expression, 6 upper facial action units
and 10 lower facial action units (Tian et al., 2001).
Kapoor et al. used the infrared camera to detect pupil,
and then extracted parameters through principle com-
ponent analysis, finally used support vector machine
to recognize upper facial action units and combined
action units (Kapoor et al., 2003). Tong et al. used
dynamic bayesian network to present rigid and non-
rigid facial movement and their temporal-spatial rela-
tionship (Tong et al., 2007; Tong et al., 2010). They
obtained the facial action recognition result through
facial movement measurement and probability infer-
ence, and achieved good recognition result to sponta-
neous facial expressions.
However, most of above researches have been im-
plemented in controlled conditions and limited AUs
have been recognized. Recognizing subtle facial ac-
tion units in real life is still a challenge. This paper
aims to propose a method to recognize and infer more
action units of facial expression.
2 FACIAL REGION LOCATION
We used the method of haar-like wavelet feature ex-
traction and AdaBoost classification (Viola and Jones,
2001) to detect face from image sequences. Then eyes
were detected in the face using the same method. If
the two eyes are not in a horizontal level, the face will
be aligned using affine transformation.
Based on the eyes location, we obtain the local re-
gions of face, such as nose region, above eyes region,
below eyes region, and below nose region. The illus-
tration of the facial regions is shown in figure 1.
3 GABOR WAVELET AND
OPTICAL FLOW FEATURE
EXTRACTION
To extract the Gabor features of image sequences for
action unit recognition, firstly, the difference image
694
Li Xue Y., Mao X. and Chang Q..
FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS.
DOI: 10.5220/0003834006940697
In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 694-697
ISBN: 978-989-8565-03-7
Copyright
c
2012 SCITEPRESS (Science and Technology Publications, Lda.)
(a) (b) (c) (d) (e)
Figure 1: Illustration of the facial regions (a) eyes detected
(b) nose inferred (c) above eyes region (d) below eyes re-
gion (e) below nose region (copyright @Jeffrey Cohn).
is obtained by subtracting the neutral image from the
apex image. Then, Gabor wavelet feature is obtained
by convolving the difference image with a set of mul-
tiscale and multiorientation Gabor filters.
The whole normalized face region is convolvedby
a set of Gabor filters at two spatial frequences and four
orientations. The Gabor wavelet coefficient is shown
in equation 1.
J = kJke
jφ
(1)
where kJk is the magnitude and φ is the phase.
φ =π/4, π/2, 3π/4 and π.
Optical flow has also been used to track the motion
information of facial features in image sequence. Op-
tical flow method assumes that the gray values in any
image feature region do not change two consecutive
frames, but only shift from one position to another.
The calculation of optical flow is shown in equation 3.
I
x
V
x
+ I
y
V
y
= I
t
(2)
where V
x
,V
y
are the x and y components of the veloc-
ity or optical flow of I (x, y,t) and I
x
, I
y
and I
t
are the
derivatives of the image at (x,y,t) in the correspond-
ing directions.
4 SUPPORT VECTOR MACHINE
FOR FACIAL ACTION UNIT
RECOGNITION
We choose support vector machine (SVM) as the clas-
sifier for facial action unit recognition. Given l obser-
vations, each of which consists of a vector x
i
R
n
,
i = 1,· ·· ,l and related label y
i
. The task of SVM
is to study the mapping x
i
7→ y
i
, the machine is de-
fined by a list of possible mappings x
i
7→ f (x,φ),
where function f (x, φ) can be obtained by parame-
ter φ. Given φ, we can obtain a trained machine, the
expectation of test error or expected risk of which
is R(φ) =
R
1
2
|y f (x,φ)|dP(x,y), where P(x,y) is
the unknownprobability distribution. The experimen-
tal risk is the average error rate on the training set
R
emp
(φ) =
1
2l
l
i=1
|y
i
f (x
i
,φ)|. Then the expected
risk satisfied the equation 3.
R(φ) R
emp
(φ) +
r
h(log(2l/h) + 1) log(η/4)
l
(3)
where h is the non-negative integer called VC (Vap-
nik Chervonenkis) dimension, which is the quanti-
fied measurement of the ability of learning machine.
0 η 1, when no error η = 0, and when the worst
situation η = 1. The right side of equation 3 is the
risk boundary , and the lowest upper boundary can be
obtained through choosing learning machine f (x, φ).
5 DYNAMIC BAYESIAN
NETWORKS FOR FACIAL
ACTION UNIT INFERENCE
Bayesian Networks (BNs) are graph models for rea-
soning under uncertainty, where the nodes represent
discrete or continuous variables, and the arcs repre-
sent the direct connections between them. The depen-
dency is characterized by a conditional probability ta-
ble (CPT) for each node. Dynamic bayesian networks
(DBNs) is an extension of BNs to handle temporal
models (Korb and Nicholson, 2004).
Firstly, we use a BN to model and learn relation-
ships among AUs. Then, a DBN is made up of inter-
connected time slices of static BNs to model the dy-
namics in AU development and represent probabilis-
tic relationships among AUs. Let θ
ijk
indicate a prob-
ability parameter for a DBN with structure B
S
, as seen
in equation 4.
θ
ijk
= p
x
k
i
|pa
j
(X
i
),B
S
(4)
where i ranges over all the variables (nodes) in the
DBN, j ranges over all the possible parent instanti-
ations for variable X
i
, k ranges over all the instanti-
ations for X
i
, x
k
i
represents the kth state of variable
X
i
, and pa
j
(X
i
) is the jth configuration of the parent
nodes of X
i
.
We learn the parameters of the DBN in order to
infer each AU. The learning process maximizes the
posterior distribution p(θ|D,B
S
), given a database D
and the structure B
S
. Detailed description of DBN
modelling for facial action unit inference can be seen
in (Tong et al., 2007).
FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS
695
6 EXPERIMENTAL RESULTS
Experiments were conducted on the Cohn Kanade
AU-Coded facial expression database (Kanade et al.,
2000), which provides image sequences of facial ex-
pressions of 97 subjects. The facial expression images
were coded into upper and lower AUs separately.
6.1 Facial Action Unit Recognition
The faces in neutral and apex facial expression im-
ages were detected and resized to 48*48. For the Ga-
bor feature extraction, we obtained 8 different Gabor
features and down sampled to a vector of 4608 di-
mensions. For the optical flow feature extraction, we
obtained the optical flow in x level and y level of each
pixel, and combined to a vector of 4608 dimensions.
Support vector machine classification was conducted
using 5-fold cross validation.
The upper and lower action unit recognition re-
sults using Gabor feature extraction are shown in ta-
ble 1 and table 2 respectively. We can see that the
classifiers reached high recognition rates, with little
decrease when the number of classes increases. Note
that although AU9 (Nose Wrinkler) is introduced in
the lower face action units in FACS, we code it with
the upper face action units as the main feature of it is
around the root of nose located in the upper face. Fur-
thermore, we recognized some AU not recognized in
early researches, such as AU43.
Table 1: Upper action unit recognition result using Gabor
wavelet feature extraction.
N. action unit category RR
2 1+2+5;6 94.41%
3 1+2+5;6;4+7 92.76%
5 1+2+5;6;4+7;4+6+7+9;4 90.56%
7 1+2+5;6;4+7;4+6+7+9;4; 89.79%
4+6+7+9+43;4+7+9
note:N.= number of classes; RR=recognition rate.
Table 2: Lower action unit recognition result using Gabor
wavelet feature extraction.
N. action unit category RR
2 11+12+25;25+27 96.62%
4 11+12+25;15;25;25+27 94.49%
6 11+12+25;15;25;25+27;23;20+25 92.23%
8 11+12+25;15;25;25+27; 91.38%
23;20+25;17;11+20+25
10 11+12+25;15;25;25+27;23; 89.94%
20+25;17;11+20+25;11+12;15+17
12 11+12+25;15;25;25+27;23; 89.13%
20+25;17;11+20+25;11+12;
15+17;10+11;11+15
note:N.= number of classes; RR=recognition rate.
We also use the local feature for action units
recognition. We compared the global and local fea-
ture extraction using both Gabor wavelet and optical
flow feature extraction. The above eyes upper action
unit recognition results using Gabor wavelet and op-
tical flow feature extraction are shown in table 3 and
table 4 respectively. We can see that, for AU1+2+5
and AU4+7 classification, local optical flow feature
extraction in the above eyes region can achieve best
result; for AU1+2+5, AU4+7 and AU4 classification,
global Gabor wavelet feature extraction can achieve
best result.
Table 3: Global and local upper action unit recognition re-
sult using Gabor wavelet feature extraction.
N. action unit category RR (global) RR (local)
2 1+2+5;4+7 91.18% 86.76%
3 1+2+5;4+7;4 91.95% 80.54%
note:N.= number of classes; RR=recognition rate.
Table 4: Global and local upper action unit recognition re-
sult using optical flow feature extraction.
N. action unit category RR (global) RR (local)
2 1+2+5;4+7 88.24% 97.79%
3 1+2+5;4+7;4 81.88% 89.26%
note:N.= number of classes; RR=recognition rate.
However, the lower action unit recognition re-
sults using optical flow is not better than using Gabor
wavelet in our experiments.
We also trained and recognized the upper and
lower action units on hemifaces using optical flow.
The results are shown in table 5 and table 6 respec-
tively. We can see that the recognition rate of upper
action units on hemifaces is higher than on full face.
So we can recognize upper action units on hemifaces
to improve the recognition rate.
For the lower action unit recognition, the recogni-
tion rate on hemifaces is higher than on full face when
the number of classes is small; when then umber of
classes is large, the recognition rate on hemifaces is
lower than on full face.
Table 5: Upper action unit recognition results on hemifaces
using optical flow.
N. RR RR (LH) RR (RH)
2 88.24% 94.12% 94.85%
3 85.03% 90.42% 91.62%
4 77.78% 83.33% 85.56%
5 74.21% 79.47% 80.53%
6 70.85% 74.37% 77.39%
note:N.= number of classes; RR=recognition rate; LH=left
hemiface; RH=right hemiface.
VISAPP 2012 - International Conference on Computer Vision Theory and Applications
696
Table 6: Lower action unit recognition results on hemifaces
using optical flow.
N. RR RR (LH) RR (RH)
4 86.81% 89.83% 88.14%
6 78.72% 82.33% 79.51%
8 70.06% 72.0% 68.62%
10 67.44% 67.24% 64.66%
12 66.21% 65.76% 63.32%
note:N.= number of classes; RR=recognition rate; LH=left
hemiface; RH=right hemiface.
Consequently, for the global feature extraction,
Gabor wavelet feature extraction can reach better
recognition rate than optical flow feature extraction;
for the local feature extraction, optical flow can reach
better recognition rate in the above eyes region. We
combined both of these feature extraction methods to
achieve best recognition result.
To get accurate recognition result, we first rec-
ognize the action units using the classifier of most
classes, if the recognized action units combination is
included in other classifiers, these classifiers will also
be used to verify the recognition result.
6.2 Facial Action Unit Inference
Using the AU codes of 452 samples of facial expres-
sions as the training data, we learned the BN of 22
action units, as seen in figure 2. Compared to the
BN of 14 action units learned by (Tong et al., 2007),
more nodes and links are learned in our research.
This means that there are more complex relationships
among the AUs. Based on the recognized AUs as ev-
idence, the DBN can infer related AUs according to
the corresponding predicted probabilities.
Figure 2: The learned BN of 22 action units.
Specially, the AUs we don’t recognize directly
because of lack of samples such as AU13(Sharp
lip puller), AU16(Lower lip depress), AU24(Lip
presser), AU26(Jaw drop) and AU31(Jaw clencher)
can be inferred with their probabilities. For ex-
ample, according to the CPTs of the learned DBN,
P(AU31 = 1|AU26 = 1) = 0.7308, means that when
Jaw drop occurs, Jaw clencher will also occur with
the probability of 0.7308.
7 CONCLUSIONS
Aiming to recognize facial action units efficiently, we
analyze the Gabor wavelet and optical flow feature ex-
traction in global and local facial regions, and use sup-
port vector machine and dynamic bayesian network
for classification and inference respectively. The pro-
posed method is capable of recognizing and inferring
most action units in FACS, and can reach good per-
formance.
ACKNOWLEDGEMENTS
This work is supported by National Nature Science
Foundation of China (No.60873269, No.61103097),
International Cooperation between China and Japan
(No.2010DFA11990) and the Fundamental Research
Funds for the Central Universities.
REFERENCES
Kanade, T., Cohn, J., and Tian, Y. L. (2000). Compre-
hensive database for facial expression analysis. In
Proc. of 4th IEEE Int. Conf. on Automatic Face and
Gesture Recognition, pages 46–53, Washington, DC,
USA. IEEE Computer Society.
Kapoor, A., Qi, Y., and Picard, R. W. (2003). Fully auto-
matic upper facial action recognition. In Proc. of IEEE
Int. Workshop on Analysis and Modeling of Faces and
Gestures, pages 195–202.
Korb, K. B. and Nicholson, A. E. (2004). Bayesian Artificial
Intelligence. A CRC Press Company.
Pantic, M. and Rothkrantz, L. J. M. (2000). Expert system
for automatic analysis of facial expressions. Image
and Vision Computing, 18(11):881–905.
Tian, Y. L., Kanade, T., and Cohn, J. F. (2001). Recogniz-
ing action units for facial expression analysis. IEEE
Trans. on PAMI, 23(2):97–115.
Tong, Y., Chen, J. X., and Ji, Q. (2010). A unified
probabilistic framework for spontaneous facial action
modeling and understanding. IEEE Trans. on PAMI,
32(2):258–273.
Tong, Y., Liao, W. H., and Ji, Q. (2007). Facial action unit
recognition by exploiting their dynamic and semantic
relationships. IEEE Trans. on PAMI, 29(10):1–17.
Viola, P. and Jones, M. (2001). Rapid object detection us-
ing a boosted cascade of simple features. In Proc. of
IEEE Conf. on CVPR, pages 511–518, Washington,
DC, USA. IEEE Computer Society.
FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS
697