FACIAL ACTION UNIT RECOGNITION AND INFERENCE

FOR FACIAL EXPRESSION ANALYSIS

Yu Li Xue, Xia Mao and Qing Chang

School of Electronic and Information Engineering, Beihang University, Xueyuan Road 37, Haidian District, Beijing, China

Keywords:

Facial Action Unit Recognition, Gabor Wavelet, Optical Flow, Support Vector Machine, Dynamic Bayesian

Network.

Abstract:

Human facial expression is extremely abundant, and can be described by numerous facial action units. Rec-

ognizing facial action units helps catching the inner emotion or intention of human. In this paper, we propose

a novel method for facial action unit recognition and inference. We used Gabor wavelet and optical ﬂow for

feature extraction, and used support vector machine and dynamic bayesian network for classiﬁcation and in-

ference respectively. We combined the advantages of both global and local feature extraction, recognized the

most discriminant AUs with multiple classiﬁers to achieve high recognition rate, and then inference the related

AUs. Experiments were conducted on the Cohn-Kanade AU-Coded database. The results demonstrated that

compared to early researches for facial action units recognition, our method is capable of recognizing more

action units and achieved good performance.

1 INTRODUCTION

Human facial expression plays an important role in

human daily communications. It is important to ana-

lyze facial expression in the ﬁelds of psychology and

affective computing.

Many researchers have proposed methods for fa-

cial action unit recognition. Pantic et al. founded a

facial expression recognition expert system including

various methods to recognize 16 action units and 6 ba-

sic facial expressions in both frontal and proﬁle views

(Pantic and Rothkrantz, 2000). Tian et al. extracted

both permanent and temporal facial features and rec-

ognized neutral expression, 6 upper facial action units

and 10 lower facial action units (Tian et al., 2001).

Kapoor et al. used the infrared camera to detect pupil,

and then extracted parameters through principle com-

ponent analysis, ﬁnally used support vector machine

to recognize upper facial action units and combined

action units (Kapoor et al., 2003). Tong et al. used

dynamic bayesian network to present rigid and non-

rigid facial movement and their temporal-spatial rela-

tionship (Tong et al., 2007; Tong et al., 2010). They

obtained the facial action recognition result through

facial movement measurement and probability infer-

ence, and achieved good recognition result to sponta-

neous facial expressions.

However, most of above researches have been im-

plemented in controlled conditions and limited AUs

have been recognized. Recognizing subtle facial ac-

tion units in real life is still a challenge. This paper

aims to propose a method to recognize and infer more

action units of facial expression.

2 FACIAL REGION LOCATION

We used the method of haar-like wavelet feature ex-

traction and AdaBoost classiﬁcation (Viola and Jones,

2001) to detect face from image sequences. Then eyes

were detected in the face using the same method. If

the two eyes are not in a horizontal level, the face will

be aligned using afﬁne transformation.

Based on the eyes location, we obtain the local re-

gions of face, such as nose region, above eyes region,

below eyes region, and below nose region. The illus-

tration of the facial regions is shown in ﬁgure 1.

3 GABOR WAVELET AND

OPTICAL FLOW FEATURE

EXTRACTION

To extract the Gabor features of image sequences for

action unit recognition, ﬁrstly, the difference image

694

Li Xue Y., Mao X. and Chang Q..

FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS.

DOI: 10.5220/0003834006940697

In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP-2012), pages 694-697

ISBN: 978-989-8565-03-7

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

(a) (b) (c) (d) (e)

Figure 1: Illustration of the facial regions (a) eyes detected

(b) nose inferred (c) above eyes region (d) below eyes re-

gion (e) below nose region (copyright @Jeffrey Cohn).

is obtained by subtracting the neutral image from the

apex image. Then, Gabor wavelet feature is obtained

by convolving the difference image with a set of mul-

tiscale and multiorientation Gabor ﬁlters.

The whole normalized face region is convolvedby

a set of Gabor ﬁlters at two spatial frequences and four

orientations. The Gabor wavelet coefﬁcient is shown

in equation 1.

J = kJke

jφ

(1)

where kJk is the magnitude and φ is the phase.

φ =π/4, π/2, 3π/4 and π.

Optical ﬂow has also been used to track the motion

information of facial features in image sequence. Op-

tical ﬂow method assumes that the gray values in any

image feature region do not change two consecutive

frames, but only shift from one position to another.

The calculation of optical ﬂow is shown in equation 3.

+ I

= −I

(2)

where V

are the x and y components of the veloc-

ity or optical ﬂow of I (x, y,t) and I

, I

and I

are the

derivatives of the image at (x,y,t) in the correspond-

ing directions.

4 SUPPORT VECTOR MACHINE

FOR FACIAL ACTION UNIT

RECOGNITION

We choose support vector machine (SVM) as the clas-

siﬁer for facial action unit recognition. Given l obser-

vations, each of which consists of a vector x

∈ R

i = 1,· ·· ,l and related label y

. The task of SVM

is to study the mapping x

7→ y

, the machine is de-

ﬁned by a list of possible mappings x

7→ f (x,φ),

where function f (x, φ) can be obtained by parame-

ter φ. Given φ, we can obtain a trained machine, the

expectation of test error or expected risk of which

is R(φ) =

|y− f (x,φ)|dP(x,y), where P(x,y) is

the unknownprobability distribution. The experimen-

tal risk is the average error rate on the training set

emp

(φ) =

∑

i=1

− f (x

,φ)|. Then the expected

risk satisﬁed the equation 3.

R(φ) ≤ R

emp

(φ) +

h(log(2l/h) + 1)− log(η/4)

(3)

where h is the non-negative integer called VC (Vap-

nik Chervonenkis) dimension, which is the quanti-

ﬁed measurement of the ability of learning machine.

0 ≤ η ≤ 1, when no error η = 0, and when the worst

situation η = 1. The right side of equation 3 is the

risk boundary , and the lowest upper boundary can be

obtained through choosing learning machine f (x, φ).

5 DYNAMIC BAYESIAN

NETWORKS FOR FACIAL

ACTION UNIT INFERENCE

Bayesian Networks (BNs) are graph models for rea-

soning under uncertainty, where the nodes represent

discrete or continuous variables, and the arcs repre-

sent the direct connections between them. The depen-

dency is characterized by a conditional probability ta-

ble (CPT) for each node. Dynamic bayesian networks

(DBNs) is an extension of BNs to handle temporal

models (Korb and Nicholson, 2004).

Firstly, we use a BN to model and learn relation-

ships among AUs. Then, a DBN is made up of inter-

connected time slices of static BNs to model the dy-

namics in AU development and represent probabilis-

tic relationships among AUs. Let θ

ijk

indicate a prob-

ability parameter for a DBN with structure B

, as seen

in equation 4.

ijk

= p



|pa

),B



(4)

where i ranges over all the variables (nodes) in the

DBN, j ranges over all the possible parent instanti-

ations for variable X

, k ranges over all the instanti-

ations for X

, x

represents the kth state of variable

, and pa

) is the jth conﬁguration of the parent

nodes of X

We learn the parameters of the DBN in order to

infer each AU. The learning process maximizes the

posterior distribution p(θ|D,B

), given a database D

and the structure B

. Detailed description of DBN

modelling for facial action unit inference can be seen

in (Tong et al., 2007).

FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS

695

6 EXPERIMENTAL RESULTS

Experiments were conducted on the Cohn Kanade

AU-Coded facial expression database (Kanade et al.,

2000), which provides image sequences of facial ex-

pressions of 97 subjects. The facial expression images

were coded into upper and lower AUs separately.

6.1 Facial Action Unit Recognition

The faces in neutral and apex facial expression im-

ages were detected and resized to 48*48. For the Ga-

bor feature extraction, we obtained 8 different Gabor

features and down sampled to a vector of 4608 di-

mensions. For the optical ﬂow feature extraction, we

obtained the optical ﬂow in x level and y level of each

pixel, and combined to a vector of 4608 dimensions.

Support vector machine classiﬁcation was conducted

using 5-fold cross validation.

The upper and lower action unit recognition re-

sults using Gabor feature extraction are shown in ta-

ble 1 and table 2 respectively. We can see that the

classiﬁers reached high recognition rates, with little

decrease when the number of classes increases. Note

that although AU9 (Nose Wrinkler) is introduced in

the lower face action units in FACS, we code it with

the upper face action units as the main feature of it is

around the root of nose located in the upper face. Fur-

thermore, we recognized some AU not recognized in

early researches, such as AU43.

Table 1: Upper action unit recognition result using Gabor

wavelet feature extraction.

N. action unit category RR

2 1+2+5;6 94.41%

3 1+2+5;6;4+7 92.76%

5 1+2+5;6;4+7;4+6+7+9;4 90.56%

7 1+2+5;6;4+7;4+6+7+9;4; 89.79%

4+6+7+9+43;4+7+9

note:N.= number of classes; RR=recognition rate.

Table 2: Lower action unit recognition result using Gabor

wavelet feature extraction.

N. action unit category RR

2 11+12+25;25+27 96.62%

4 11+12+25;15;25;25+27 94.49%

6 11+12+25;15;25;25+27;23;20+25 92.23%

8 11+12+25;15;25;25+27; 91.38%

23;20+25;17;11+20+25

10 11+12+25;15;25;25+27;23; 89.94%

20+25;17;11+20+25;11+12;15+17

12 11+12+25;15;25;25+27;23; 89.13%

20+25;17;11+20+25;11+12;

15+17;10+11;11+15

note:N.= number of classes; RR=recognition rate.

We also use the local feature for action units

recognition. We compared the global and local fea-

ture extraction using both Gabor wavelet and optical

ﬂow feature extraction. The above eyes upper action

unit recognition results using Gabor wavelet and op-

tical ﬂow feature extraction are shown in table 3 and

table 4 respectively. We can see that, for AU1+2+5

and AU4+7 classiﬁcation, local optical ﬂow feature

extraction in the above eyes region can achieve best

result; for AU1+2+5, AU4+7 and AU4 classiﬁcation,

global Gabor wavelet feature extraction can achieve

best result.

Table 3: Global and local upper action unit recognition re-

sult using Gabor wavelet feature extraction.

N. action unit category RR (global) RR (local)

2 1+2+5;4+7 91.18% 86.76%

3 1+2+5;4+7;4 91.95% 80.54%

note:N.= number of classes; RR=recognition rate.

Table 4: Global and local upper action unit recognition re-

sult using optical ﬂow feature extraction.

N. action unit category RR (global) RR (local)

2 1+2+5;4+7 88.24% 97.79%

3 1+2+5;4+7;4 81.88% 89.26%

note:N.= number of classes; RR=recognition rate.

However, the lower action unit recognition re-

sults using optical ﬂow is not better than using Gabor

wavelet in our experiments.

We also trained and recognized the upper and

lower action units on hemifaces using optical ﬂow.

The results are shown in table 5 and table 6 respec-

tively. We can see that the recognition rate of upper

action units on hemifaces is higher than on full face.

So we can recognize upper action units on hemifaces

to improve the recognition rate.

For the lower action unit recognition, the recogni-

tion rate on hemifaces is higher than on full face when

the number of classes is small; when then umber of

classes is large, the recognition rate on hemifaces is

lower than on full face.

Table 5: Upper action unit recognition results on hemifaces

using optical ﬂow.

N. RR RR (LH) RR (RH)

2 88.24% 94.12% 94.85%

3 85.03% 90.42% 91.62%

4 77.78% 83.33% 85.56%

5 74.21% 79.47% 80.53%

6 70.85% 74.37% 77.39%

note:N.= number of classes; RR=recognition rate; LH=left

hemiface; RH=right hemiface.

VISAPP 2012 - International Conference on Computer Vision Theory and Applications

696

Table 6: Lower action unit recognition results on hemifaces

using optical ﬂow.

N. RR RR (LH) RR (RH)

4 86.81% 89.83% 88.14%

6 78.72% 82.33% 79.51%

8 70.06% 72.0% 68.62%

10 67.44% 67.24% 64.66%

12 66.21% 65.76% 63.32%

note:N.= number of classes; RR=recognition rate; LH=left

hemiface; RH=right hemiface.

Consequently, for the global feature extraction,

Gabor wavelet feature extraction can reach better

recognition rate than optical ﬂow feature extraction;

for the local feature extraction, optical ﬂow can reach

better recognition rate in the above eyes region. We

combined both of these feature extraction methods to

achieve best recognition result.

To get accurate recognition result, we ﬁrst rec-

ognize the action units using the classiﬁer of most

classes, if the recognized action units combination is

included in other classiﬁers, these classiﬁers will also

be used to verify the recognition result.

6.2 Facial Action Unit Inference

Using the AU codes of 452 samples of facial expres-

sions as the training data, we learned the BN of 22

action units, as seen in ﬁgure 2. Compared to the

BN of 14 action units learned by (Tong et al., 2007),

more nodes and links are learned in our research.

This means that there are more complex relationships

among the AUs. Based on the recognized AUs as ev-

idence, the DBN can infer related AUs according to

the corresponding predicted probabilities.

Figure 2: The learned BN of 22 action units.

Specially, the AUs we don’t recognize directly

because of lack of samples such as AU13(Sharp

lip puller), AU16(Lower lip depress), AU24(Lip

presser), AU26(Jaw drop) and AU31(Jaw clencher)

can be inferred with their probabilities. For ex-

ample, according to the CPTs of the learned DBN,

P(AU31 = 1|AU26 = 1) = 0.7308, means that when

Jaw drop occurs, Jaw clencher will also occur with

the probability of 0.7308.

7 CONCLUSIONS

Aiming to recognize facial action units efﬁciently, we

analyze the Gabor wavelet and optical ﬂow feature ex-

traction in global and local facial regions, and use sup-

port vector machine and dynamic bayesian network

for classiﬁcation and inference respectively. The pro-

posed method is capable of recognizing and inferring

most action units in FACS, and can reach good per-

formance.

ACKNOWLEDGEMENTS

This work is supported by National Nature Science

Foundation of China (No.60873269, No.61103097),

International Cooperation between China and Japan

(No.2010DFA11990) and the Fundamental Research

Funds for the Central Universities.

REFERENCES

Kanade, T., Cohn, J., and Tian, Y. L. (2000). Compre-

hensive database for facial expression analysis. In

Proc. of 4th IEEE Int. Conf. on Automatic Face and

Gesture Recognition, pages 46–53, Washington, DC,

USA. IEEE Computer Society.

Kapoor, A., Qi, Y., and Picard, R. W. (2003). Fully auto-

matic upper facial action recognition. In Proc. of IEEE

Int. Workshop on Analysis and Modeling of Faces and

Gestures, pages 195–202.

Korb, K. B. and Nicholson, A. E. (2004). Bayesian Artiﬁcial

Intelligence. A CRC Press Company.

Pantic, M. and Rothkrantz, L. J. M. (2000). Expert system

for automatic analysis of facial expressions. Image

and Vision Computing, 18(11):881–905.

Tian, Y. L., Kanade, T., and Cohn, J. F. (2001). Recogniz-

ing action units for facial expression analysis. IEEE

Trans. on PAMI, 23(2):97–115.

Tong, Y., Chen, J. X., and Ji, Q. (2010). A uniﬁed

probabilistic framework for spontaneous facial action

modeling and understanding. IEEE Trans. on PAMI,

32(2):258–273.

Tong, Y., Liao, W. H., and Ji, Q. (2007). Facial action unit

recognition by exploiting their dynamic and semantic

relationships. IEEE Trans. on PAMI, 29(10):1–17.

Viola, P. and Jones, M. (2001). Rapid object detection us-

ing a boosted cascade of simple features. In Proc. of

IEEE Conf. on CVPR, pages 511–518, Washington,

DC, USA. IEEE Computer Society.

FACIAL ACTION UNIT RECOGNITION AND INFERENCE FOR FACIAL EXPRESSION ANALYSIS

697