4 EXPERIMENTS
Since there was no data set available for our
proposed approach, we have presented the method to
construct the data set by ourselves. We designed and
developed the software to collect the data set and the
Graphical User Interface (GUI) of our software is
shown in Figure 6. Data set collector software was
developed by C# .net with Kinect API (all are
packed with Kinect for Windows SDK).
The software was able to collect eight emotions
including happiness, sadness, surprise, fear, anger,
disgust, contempt and neutral. The actors had 15
seconds (325 frames) for expressing each emotion.
Each frame consisted of 1347 coordinates and it
would be reduced after FACS selection process.
This time, we have collected data set from fifteen
actors for our experiment (eight emotions per each).
In each emotion, they have been asked to freely act
three times of emotions in according with the
emotion label shown on the screen because we
wanted to get the data that intuitively represented
their emotions. Therefore we had 325*8*15 =
39,000 frames which could be separated into 3*8*15
= 360 gesture instances for template dictionary
generation phase. The system environment for our
experiment was Intel® core™ i5-4570 3.20GHz, 4
GB DDR2, Windows 8.1x64, GPU NVIDIA
GeForce GTX 750 Ti. We decided to use native C++
in order to be able to use CUDA (Yang et al., 2008)
architecture provided by NVIDIA GeForce GPU
which could activate multi-thread processing on
GPU. Therefore our feature extraction was executed
in parallel processing.
Due to the constraints of time and system
environment, we have fixed the number of clusters
in template dictionary to five clusters (G = 5).
As we have mentioned in an introduction, the
goal of our current work was to find the best fitting
parameter of classifier. From our previous work, we
have used K-NN and SVM. So, we continued using
these classifiers with various values and kernel
functions. 10-fold cross validation was used for the
performance and accuracy evaluation. An open
source software named RapidMiner which was
popular, convenient and reliable described in paper
of Jović et al. (2014) was used in classification
phase.
Figure 7(a) shows the result of K-NN classifier
working with various values of k parameter. As you
might see, the accuracy linearly decreases when the
value of k is raised. Consequently, k equals to one is
the best fitting value which reflects the best accuracy
of 90.33%.
The execution time of K-NN is quite linear.
1,200 frames were sampled from 39,000 frames for
training and testing. As shown in Figure 7(b), the
execution time swings between 1.7 to 1.8 seconds,
whether increasing or decreasing k.
Figure 7(c) manifests the result of classifying
with SVM. The kernels we used for the experiment
consisted of five kernels - Dot, Radial, Linear,
Polynomial degree 2 and degree 3 respectively. The
result seems to be worse than K-NN, especially with
polynomial degree three kernel. It shows the worst
accuracy which is just 18.08%, nevertheless, if we
change to polynomial degree two, it reveals an
outstanding result among those kernel functions of
SVM. The execution time of SVM is severely high
comparing to K-NN as shown in Figure 7(d).
The reason why K-NN distinctly beats SVM is
the problem of our feature vectors having very high
dimension or attribute number. Even though SVM
has many kernel functions to modify the dimension
of input space, there seems to be no kernel function
that could make a good feature space linearly
separable by the hyperplane.
Figure 6: Data set collector software.
Tables 4 and 5 are the confusion matrices of K-
NN with k = 1 and SVM with Polynomial degree
two respectively. The accuracy of detecting fear is
the worst. In contrast, the detection accuracy of
neutral overcomes all other emotions. It is more
confusion in fear because the expression is
ambiguous and rather similar to surprise. Therefore,
there is a high probability to get confused with
surprise, whereas the motion of neutral is clearer.
So, we could get the best accuracy in this case.
Table 6 shows the accuracy comparison between
proposed approach and state-of-the-art approach
(Mao et al., 2015). Since there are several different
factors, such as data set, system environment,
number of classes, classifier etc. mentioned in
related works section, the information in this table is
SIGMAP 2016 - International Conference on Signal Processing and Multimedia Applications