4 METHODS
The computing approaches and algorithms used in the
proposed work will be discussed in this section
according to the modules designed in the architecture
shown in figure 1.
4.1 Feature Extraction
The features of the areas around the eyes, the mouth,
and those of the entire face were extracted separately.
The term “features” here refers to the Action Units
(AUs), derived from FACS and present alongside the
labelled images in the CK+ dataset. The dataset lists
more than thirty AUs that are found in a human face.
Each of these AUs produce a muscle movement in a
particular region (such as the eyes, mouth, cheeks,
nose, or the jaw) of the face, and some are similarly
involved in more than one such region. To decide
which AU corresponds to which area of the face, and
to further decide whether it lies in the regions of
interest (RoI) of this work, the description of each AU
was checked to see if it was involved in the taxonomy
of the RoI and listed. These listed AUs were then
extracted by reading the text files corresponding to
each image (using scripts, programmed in Python3)
and stored in separate CSV files for each of the RoIs.
The prototype definitions of an emotion, also
described in the dataset, are built from combinations
of these AUs. Thus one of the tasks of this work is to
have classifiers learn such combinations in the RoI
and employ those patterns onto test images to classify
them based on the emotions shown in the picture.
There are instances in the dataset where certain AUs
are not present in some of the images, and yet the
intensity of those AUs is not 0. This is because 0
intensity signifies the existence of the AU but
undefined intensity. Thus, the absent AU intensities
are filled with the mean intensity value of that AU
over all the images in which it is present. The files
also store the emotion labels of each of the images.
The emotion label files consist of a single number in
the exponential format. This is the number
corresponding to each of the eight emotions. Each
such number ranging from 0 to 7 has been assigned to
each of the eight emotions as follows: Neutral (0),
Anger (1), Contempt (2), Disgust (3), Fear (4),
Happiness (5), Sadness (6), and Surprise (7). Due to
the high dimensionality, it is essential to select a
subset of these AUs, for each region, so that further
tests can be carried out more easily. After the
extraction of the features, the serialisation of the
dataset is finally complete by plugging in the absent
AUs in each of the image and for each RoI, with the
mean of the values of that AU in all the face images
of the dataset which show signs of presence of that
AU in the face.
4.2 Feature Subset Selection
Feature Subset selection serves two functions: first, it
reduces the dimensionality of the dataset, which
consequently reduces the chances of overfitting as
well as the computational power and time
requirement. Second, it gives rise to more abstracted
data, finds, and brings out the patterns in the
relationships between the different features and how
they affect the class that the sample belongs to. For
the purpose of feature subset selection in the eye
region, all those features that do not contribute to the
decision of the emotion labels were removed first.
About 5 such AUs were removed, as none of the
images available displayed any of those AUs, leaving
7 features. This seems like a good enough number of
features, and we will leave it at that and further reduce
only if some issue arises with respect to the
dimensionalities.
Feature selection in the mouth region was done in
a similar manner, as in the periocular region. First, out
of the 16 AUs in the mouth region, some AUs were
present in over a hundred images, and most others
were seen in well above 40 images. Three features
were seen in less than 10 images, and these were
discarded as they provided negligible contribution to
the decision of the emotion labels. This left us with
13 features.
Similarly, for the features extracted for the entire
face, only about 24 AUs from the 30 mentioned
earlier, were available in the labelled images of the
dataset. Since 24 is too high a number compared to
the 7 features that we have chosen for the eyes and
the 13 for the mouth region, the data of these 24 AUs
was run through the Principal Components Analysis
(PCA) (Pedregos et. al, 2011), and reduced to 10.
Reducing to seven, as we did for the mouth region,
does not seem ideal as it could lead to loss in data.
However, it can not be guaranteed that these 10
features do not show any loss either. It is however, a
cause of concern to use data of a larger dimension, as
it tends to lead to overfitting and gives highly
unsatisfactory results when tested; moreover, keeping
in mind the small size of our dataset (327 samples), it
is important that we maintain an optimal
dimensionality.
All of the above processes have been carried out
in Python 3, using the Scikit-Learn package (Abdi
and Williams, 2010) that provides implementations
of the methods such as PCA and SVD (Golub and