convolutional neural network which also learns the
features which are relevant in the classification
problem. In order to incorporate motion information,
the input volume of the network is formed by two
image differences frames between three equally
spaced frames. The first image difference is
computed between the (potential) onset frame and
the (potential) apex frame, while the second image
difference is between the (potential) apex frame and
the (potential) offset frame. Another original
contribution highlighted by this paper is the use of a
sliding time window which iterates through the
video sequence and feeds the corresponding frames
to the neural network. The response of the neutral
network is further processed in order to eliminate
micro-expression false positives and to merge
together responses that belong to the same micro-
expression, as described in Section 3.3.
The proposed method is scalable and modular,
and it has the main advantage that by simply using
more training data it can “learn” to distinguish
between more micro-expression classes, or between
micro-expressions and other facial movements, such
as blinks and macro-expressions.
The remainder of this work is organized as
follows: in section 2 we present the recent advances
in the field of micro-expressions detection and
recognition. In Section 3 we discuss the general
outline and we detail the proposed solution. The
experimental results are discussed in Section 4, and
the conclusions and future directions of
improvement are detailed in Section 5.
2 STATE OF THE ART
Automatic micro expression recognition implies
three main steps: (1) selecting the facial regions of
interest, (2) defining and extracting the classification
features and (3) the actual recognition of the micro
expression using the selected features and state of
the art machine learning algorithms.
The first step involves establishing the facial area
which will be analyzed; for this step, the face is
either divided into several rectangular segments
around the most prominent facial features (eyes, lips,
nose) (Polikovsky et al, 2009) (Polikovsky et al,
2013) (Godavarthy et al, 2011) or a more complex
deformable model is used to split the face into more
accurate regions (Liu et al, 2015), (Pfister et al,
2011). Other methods (Liong et al, 2016), (Li et al,
2017) divide the face geometrically into n equal
cells and analyze the movement within these cells.
In the second step, several spatiotemporal
descriptors are extracted from the defined regions of
interest in order to describe the facial
transformations that occur over time. Several types
of descriptors can be used: dense optical flow (Liu et
al, 2015), optical strain (Liong et al, 2016), texture
based descriptors – Local Binary Patterns in Three
Orthogonal Planes (LBP-TOP) (Pfister et al, 2011)
or 3D histogram of oriented gradients (Polikovsky et
al, 2009), (Polikovsky et al, 2013). Finally, in the
last step, a machine learning algorithm (which can
be supervised (Pfister et al, 2011) or non-supervised
(Polikovsky et al, 2009), (Polikovsky et al, 2013)) is
used for the actual recognition of micro expressions.
Recently, several works tackle both the problem
of micro expression detection and the problem of
micro expression recognition. The method presented
in (Liong et al, 2016) uses optical strain features in
two different ways: first the classification is
performed based solely on optical strain information,
and second, the optical strain information is used for
weighting the LBP-TOP features. The best results
are obtained by the second method. In (Li et al,
2017), the authors propose a general micro
expression analysis framework that performs both
micro expression detection and recognition. The
detection phase does not require any training and
exploits frame difference contrast to determine the
frames where movement occurred. For the
recognition phase, several descriptors are extracted
(LBP-TOP, Histogram of Oriented Gradients (HOG)
and Histogram of Image Gradient Orientation
(HIGO)) and a support vector machine (SVM)
classifier is used to recognize the micro expression.
The majority of the works reported in the
literature follow “classical” stages of machine
learning: region of interest selection, the extraction
of motion descriptions and a classifier (usually a
SVM) to recognize the exact class of the micro-
expression. In this paper, we tackle the problem of
micro-expression detection and recognition using a
deep convolutional neural network, which is able to
automatically learn the motion features and classify
micro-expression from high speed video sequences.
As opposed to other method, the proposed solution
does not require complex face alignment or
normalization (Li et al, 2017) or the extraction of
motion features (as they are automatically learned by
the network). The network takes as input two frame
differences in order to capture the motion variation
across video frames.
VISAPP 2018 - International Conference on Computer Vision Theory and Applications
202