just one epoch, the model's proficiency in extracting
pivotal features from facial images and effectively
categorizing diverse expressions is meticulously
analyzed. Notably, the model showcases a
remarkable stability in accuracy and training loss as
the training progresses, indicating its robustness in
handling the given configuration. Despite the
relatively limited number of training iterations, the
inherent superior performance of the EfficientNetB7
model shines through, enabling it to exhibit
commendable recognition capabilities. However,
amid these successes, certain areas for enhancement
emerge. Particularly, the model's accuracy in
discerning between neutral and sad expressions
requires refinement, as it tends to misclassify these
expressions. This underscores the necessity for
further optimization to unlock the full potential of the
model in accurately capturing subtle nuances in facial
expressions.
2 METHODOLOGIES
2.1 Dataset Description and
Preprocessing
The dataset was obtained from the Kaggle platform
(Kero, 2024). The given dataset comprises multiple
categories of facial expressions, each containing a
specific number of files. Among them, the "angry"
category holds 958 files, depicting the facial
expressions of characters when they are feeling
angry. The "disgusted" category contains 111 files,
showing the disgusted facial expressions of the
characters. The "fearful" category features 1024 files,
capturing the fearful expressions of characters. The
"happy" category boasts 1774 files, exhibiting the
happy facial expressions of people. Additionally, the
"neutral" category holds 1233 files, displaying
characters with neutral or non-expressive facial
expressions. The "sad" category contains 1247 files,
showing the sad facial expressions of people. Finally,
the "surprised" category has 831 files, depicting the
surprised facial expressions of characters. This
comprehensive dataset offers a diverse range of facial
expressions, allowing for detailed analysis and
understanding of human emotions through facial
cues.
The process involves systematically iterating
through each file and folder within the directory,
printing their paths as progress. Inside the dataset are
crucial features and target variables utilized for
machine learning task, along with facial images and
their corresponding expression labels. These images
and labels are pivotal for training the model to
recognize various facial expressions. Data
enhancement techniques are employed during
processing to expand the dataset and enhance model
generalization, focusing on extracting essential
information and constructing appropriate models for
training. Preparation of test data for facial expression
recognition begins by specifying the directory
housing the test dataset, organized into subfolders
dedicated to specific expression categories. Each
subfolder is systematically traversed, gathering the
list of files it contains. For each file, its complete path
is generated by combining the subfolder's path with
the filename, and these paths are collected in a list.
Concurrently, the name of each subfolder,
representing the associated expression label, is
captured and stored in a separate list to ensure proper
alignment. After iterating through all subfolders and
files, two lists are obtained: one containing paths to
all test files and the other containing corresponding
expression labels. These lists are structured to
facilitate their use in subsequent steps, allowing for
efficient loading of test data and evaluation of the
facial expression recognition model's performance on
unseen data. In terms of hyperparameter tuning,
adjustments are made according to project
requirements and data characteristics, carefully
selecting and tuning hyperparameters such as
learning rate and batch size to optimize the model for
higher recognition accuracy.
2.2 Proposed Approach
The expression recognition system based on
Convolutional Neural Networks aims to
automatically recognize facial expressions through
deep learning techniques. The research goal is to
construct an efficient and accurate model that can
process image data and classify different facial
expressions effectively. To achieve this goal, the
following research methodology process is used: first,
import key libraries to support data processing and
model construction; then, pre-process image data,
including normalization, scaling, and other steps, to
adapt to the input requirements of the model;
subsequently, divide the dataset into training,
validation, and test sets to evaluate the performance
of the model; and then, use an image data generator
to enhance the training data's diversity; display the
training data samples to verify the data quality;
construct the Convolutional Neural Network model
structure, including convolutional layers, pooling
layers, fully connected layers, etc.; fit the model and
optimize the parameters by iterating to minimize the