approach. The experimental setup and results are
discussed in Section 4. Finally, in Section 5 we
summarize this work highlighting its main
contributions.
2 RELATED WORK
Because of the diversity of spoofing attacks, existing
traditional face anti-spoofing approaches can be
mainly categorized into four categories: (i) motion
based methods, (ii) texture based methods, (iii)
method based on image quality analysis, and (iv)
methods based on other cues. The motion based
methods was designed primarily to counter printed
photo attacks. Eye blinking (Pan et al., 2007) or lip
movement (Kollreider et al., 2007) are used for face
anti-spoofing. Given that motion is a relative feature
across video frames, these methods are expected to
have better generalization ability than the texture
based methods. However, motion based methods
need a relatively long time to accumulate stable
vitality features for face spoof detection. The texture
based methods include LBP (Maatta et al., 2011),
HOG (Komulainen et al., 2013), etc. Pereira et al.
(2012) used LBP-TOP to extract spatial and time
domain features from three orthogonal planes.
Unlike motion based methods, texture based
methods need only a single image to detect a spoof.
However, the generalization ability of many texture
based methods has been found to be poor. A recent
work (Galbally et al., 2014) proposed a biometric
liveness detection method for iris, fingerprint and
face images using 25 image quality measures,
including 21 full-reference measures and 4 non-
reference measures.
Different from traditional methods, CNNs can
extract distinguishing end-to-end features directly
from raw data, and has been proved efficient in
many other vision fields. Yang et al., (2014) extract
features by CNN, then feeding them to a SVM
classifier. Xu et al., (2016) proposed LSTM-CNN
architecture to learn the temporal structure from
videos.
Those works consider the face anti-spoofing as a
binary classification problem, all real face is one
class, and the other is all kinds of fake face. Because
of the variety of fake face, photo attacks and video
attacks will be different on the texture, reflect
illumination, resolution, etc., thus the large intra-
variance will increase the difficulty of classification.
Each model is heterogeneous and has strong
classification ability, therefore, the integration model
with stacked generalization method will make full
use of the advantages of different models,
complement each other, thus will achieve better
predictive accuracy.
3 PROPOSED METHOD
For reducing the intra-class variations, we train
different models according to the type of fraud face.
Each model can learn the deep and distinguishing
feature for classifying fake or real. The stacked
generalization method takes full advantage of each
model's prediction and change the weights of each
prediction. Then a wiser decision would be made for
maximizing generalize accuracy. With the stacked
generalized method, the training difficulty of anti-
spoofing problem is decreased than training a
general model. Besides, the model can converge
more easily.
3.1 Stacked Generalization
Stacked generalization (Wolpert, 1992; Ting and
Witten, 1997) is a general method of using a high-
level model to combine lower-level models to
achieve greater predictive accuracy. It's a scheme for
minimizing the generalization error rate of one or
more classifiers, and works by reducing the biases of
the classifiers with respect to a provided learning set.
When fusing the multiple classifiers, stacked
generalization exploited a strategy more
sophisticated for combining the individual
classifiers. Stacked generalization tries to learn
which classifiers are reliable ones, and use a higher-
level learning algorithm, the so-called "meta-
classifier", to discover the best way of how to
combine the outputs of the base classifiers. As
shown in Figure 1, there are two kinds of classifiers:
several base classifiers (leavel-0 classifiers) and one
meta-classifier (level-1 classifier). The output class
probabilities generated by level-0 models are used to
form level-1 data. Then a multivariable linear
regression model (MLR) is adapted for classification
tasks for level-1 classifier.
3.1.1 Level-0 Generalizers
As shown before (Krizhevsky et al., 2012; Simonyan
and Zisserman, 2014; He et al., 2016), deeper and
better pre-training networks lead to better
performance. ResNet (He et al., 2016) won the 1st
place on the ILSVRC 2015 classification task. The
depth of representations is of central importance for
many visual recognition tasks, especially in face
ICPRAM 2018 - 7th International Conference on Pattern Recognition Applications and Methods
318