offering a diverse toolkit for effective classification
tasks.
In the realm of emotional expression recognition,
diverse distance measures were employed in the M.
Murugappan’s study to effectively classify facial
emotions, with the average accuracy of each measure
being meticulously documented (Murugappan, 2020).
Furthermore, experimental investigations delved into
the impact of varying the k value, revealing that the
selection of k value significantly influences the
accuracy of emotion classification. The study
underscored the pivotal role of k value in the KNN
classifier, indicating that a lower k value correlates
with a heightened emotion recognition rate.
2.1.3 RF and PCA
In the realm of image processing, the traditional
Principal Component Analysis (PCA) serves as a
widely employed technique for extracting expression
features. Initially, this method involves converting
the image matrix into a one-dimensional image vector.
However, the application of the K-L transformation
leads to a significant increase in the dimension of the
image vector space, posing challenges in accurately
computing the covariance matrix. To address this
issue, the emergence of 2DPCA offers a direct
approach to handling two-dimensional image
matrices, thereby facilitating the computation of the
covariance matrix with greater ease. Given its lower
dimensionality, 2DPCA simplifies the calculation of
eigenvalues and eigenvectors of the covariance
matrix.
The fundamental concept behind 2DPCA lies in
treating each image as an undefined control sequence
that undergoes linear transformation into an m-
dimensional column vector through matrix
multiplication. Here, x denotes the n-dimensional
projected column vector, while Y represents the
mapping of an eigenvector of the matrix in the x-
direction. Following the completion of expression
feature extraction, the selection of a suitable
classification method becomes paramount. Ju Jia
purposed the random forest classifier is adopted for
its attributes of rapid classification speed, robustness
(Jia, 2016), and high recognition rates in high-
dimensional scenarios. Random forest, constructed
from multiple decision trees, proves effective in
addressing multi-data classification challenges.
The random forest classifier comprises N decision
trees (e.g., T1, T2, ..., TN), where each decision tree
functions as a voting classifier. The ultimate outcome
of random forest classification is the average of the
voting results from all decision trees. In the
experiment conducted, two testing schemes are
employed: one for testing trained individuals and
another for testing untrained individuals. Post image
preprocessing, the PCA and 2DPCA methods are
utilized for extracting expression features,
subsequently employed in training and classifying
random forest and Support Vector Machine (SVM)
classifiers.
2.2 Deep Learning
Traditional methods exhibit lower reliance on
hardware and data types in comparison to deep
learning approaches. However, they require manual
application of feature extraction and classification as
independent steps. In contrast, deep learning methods
are capable of simultaneously executing these two
processes. Within the realm of deep learning
techniques, facial expression recognition entails three
primary stages: image preprocessing, deep feature
learning, and deep feature classification. The image
preprocessing phase is a critical step that typically
encompasses the utilization of the Viola-Jones
algorithm for face detection, face alignment,
normalization, and enhancement to prepare the data.
In the realms of deep feature learning and
classification, numerous methodologies such as CNN,
DBN, DAE, RNN and LSTM have been extensively
researched and implemented.
2.2.1 CNN
In the realm of image preprocessing, CNN
configurations continue to stand out as the most
prevalent and cutting-edge, particularly in the realm
of emotion recognition. Some of the popular CNN
configurations include region-based CNN (R-CNN),
faster R-CNN, and 3D CNN. These configurations
demonstrate varying levels of accuracy across
different datasets.
S. Begaj, A. O. Topal proposed the
implementation details and results analysis of a CNN-
based facial expression recognition model (Begaj,
2020). This CNN architecture comprises four
convolutional layers, four max-pooling layers, one
dropout layer, and two fully connected layers, totaling
899,718 parameters. The model's processing pipeline
involves filtering images through Conv2D filters,
applying ReLU activation functions, downsizing
images with MaxPooling2D layers, and ultimately
flattening and applying dropout layers.
Upon evaluating the model, it was observed that
the training data outperformed the testing data,
indicating signs of overfitting. Beyond the 25th epoch,