annotation and insufficient model generalisation
ability.
The main objective of this effort is to use
Convolutional Neural Networks (CNNs) to recognise
human emotions more reliably and efficiently.
Initially, a lightweight CNN model consisting of three
convolutional layers and two fully connected layers
was used, making it possible to extract facial
expression features with minimal computational
burden. To enhance the model's adaptability, data
augmentation techniques are applied to diversify the
training samples. This involves random
transformations such as rotation, translation, and
scaling of the training images. Comparative
experiments are conducted on various model
architectures and hyperparameter configurations,
evaluating their performance via cross-validation.
Additionally, the experiment will include a
visualization of the model's decision-making process
and an analysis of the contribution of various facial
regions to emotion recognition through this process.
Results demonstrate that the proposed CNN model
achieves high accuracy in recognizing emotions
across publicly available datasets, exhibiting strong
generalization and robustness. This study introduces
innovative approaches for deep learning-based
human emotion recognition, with potential
applications in areas like intelligent customer service
and emotion computing.
2 METHODOLOGY
2.1 Dataset Description and
Preprocessing
The study employed the CK+48 human emotional
expression dataset from Kaggle (Ashadullah, 2018).
The dataset consists of 980 colour images of seven
basic human emotional expressions. Anger, surprise,
and disgust exhibited the highest recognition rates in
the experiment, while contempt and happiness
performed moderately. The recognition accuracy of
sadness and fear requires improvement. The model
was trained on the CK+48 dataset, incorporating data
augmentation, 5-fold cross-validation, and Early
Stopping strategies. On the test set, the performance
of the optimal model was evaluated, and the
prediction results were visualized through confusion
matrices and Receiver Operating Characteristic
(ROC) curves. Ultimately, the model weights and
architecture were preserved.
2.2 Proposed Approach
This study aimed to identify photos of human facial
expressions into seven basic emotion categories:
contempt, rage, disgust, fear, pleasure, sadness, and
surpriseβusing a CNN model. Preprocessing
processes, such as scaling and normalisation, were
applied to the original photos using the CK+48 face
expression dataset. Data enhancement methods such
as rotation, translation and mirroring are used to
supplement the training set to increase the model's
ability to generalise. Furthermore, the data was
partitioned into 5 subsets for training and validation
using a 5-fold cross-validation approach, which
ensured a thorough evaluation of model performance.
The CNN-based architecture was constructed with
multiple convolutional, pooling, and fully connected
layers. To mitigate overfitting and preserve optimal
model weights, Early Stopping and Model
Checkpoint callback functions were incorporated
during training. The loss function employed was
categorical Cross entropy, and RMSprop served as
the optimizer. Evaluation of the model on a test set
revealed that the best CNN model achieved a test loss
value of 0.5008 and a test accuracy of 83.25%.
Additionally, the study visualized the confusion
matrix and ROC curves to further scrutinize the
model's performance across different emotion
categories. The test image prediction results indicate
that the model can accurately predict the probability
distribution of emotion categories for a given facial
expression image. The optimal model weights, along
with the entire model structure exhibiting outstanding
performance, were preserved for future applications
and deployments. The main flowchart of this study is
depicted in Figure 1.
Figure 1: Main Process
(Photo/Picture credit: Original).