Second, FER2013 is collected in real-world
conditions, featuring images captured in uncontrolled
environments with varying lighting, angles, and
backgrounds. This makes it more representative of
real-world scenarios compared to more controlled,
posed datasets like CK+ and JAFFE, which may not
generalise as well to everyday applications.
Additionally, FER2013 is a widely used benchmark
in the field, allowing researchers to compare their
results with existing studies, ensuring consistency
and reproducibility. The larger size and diversity also
reduce the risk of overfitting, making models more
robust in practical applications. Furthermore, using a
single dataset simplifies preprocessing and reduces
computational demands, which is particularly
important when training complex models. In this
study, we also performed a preliminary cleaning of
the FER2013 dataset to improve its quality, though a
more detailed explanation of this process is provided
in Section 2.1. Finally, FER2013 includes a balanced
distribution of common emotions, providing a more
comprehensive test bed for emotion recognition
models, whereas other datasets may suffer from class
imbalance or limited categories. (Tang, 2015) states
that humans correctly identify the emotions in the
FER2013 images between 65% and 68% of the time,
demonstrating the inherent difficulty of emotion
recognition when working with this dataset.
Recent advances in deep learning, particularly
with Convolutional Neural Networks (CNNs), have
led to significant improvements in FER. CNNs are
especially well-suited for image-based tasks due to
their ability to automatically learn spatial hierarchies
of features, removing the need for manual feature
extraction. The literature on FER demonstrates the
effectiveness of CNNs in handling the complexities
stemming from the use of the FER2013 dataset.
(Khaireddin & Chen, 2021) achieved a 73.28%
accuracy on the FER2013 dataset using a fine-tuned
VGGNet architecture, illustrating the potential of
deep CNN models in emotion recognition tasks.
(Liliana, 2019) explored the detection of facial action
units using CNNs and reached a 92.81% accuracy on
the CK+ dataset, emphasising CNNs' ability to
capture subtle facial movements indicative of
emotions. (Hassouneh et al., 2020) extended FER by
integrating electroencephalograph signals with CNNs
and Long Short-Term Memory (LSTM) networks,
highlighting the potential of combining CNNs with
other modalities for enhanced emotion recognition.
Similarly, (Akhand et al., 2021) applied transfer
learning to pre-trained CNNs, fine-tuning models for
emotion-specific tasks, and reported accuracies of
96.51% on the KDEF dataset and 99.52% on the
JAFFE dataset. These studies demonstrate the
adaptability and effectiveness of CNNs in FER across
various datasets and configurations.
In this work, custom CNN architectures are
developed and optimised to improve the accuracy of
FER using the FER2013 dataset. Different
hyperparameters and techniques are experimented on
to observe their effects on the model’s performance
and assess how various configurations influence the
results. Additionally, various architectural
components are explored. While the architectures we
employed did not achieve the highest reported
accuracy compared to the best models in the
literature, we believe our work offers significant
contributions. Specifically, we conducted extensive
experiments testing a wide range of hyperparameter
configurations, providing valuable insights into how
these variations impact model accuracy. This detailed
exploration of hyperparameter tuning - covering
aspects such as architectural depth, augmentation
techniques, learning rate, batch size, dropout rate, and
other regularisation methods - offers a unique
perspective that is often overlooked in studies focused
solely on peak performance. By systematically
analysing how different hyperparameter values
influence model behaviour, our study provides a
deeper understanding of the intricacies involved in
optimising CNNs for FER tasks. Such insights are
critical for researchers looking to refine existing
models or develop new architectures. Additionally,
the findings from our hyperparameter analysis serve
as a practical guide for future research, offering
actionable recommendations for tuning CNNs in this
domain. We believe that this contribution fills an
important gap in the literature and deserves attention
for advancing both practical applications and
theoretical understanding of FER model optimisation.
2 EXPERIMENTAL RESULTS
This section details the dataset used, as well as the
experimental setup, including data pre-processing,
model architecture, and hyperparameter tuning.
Various experiments were conducted to compare the
effects of different approaches, such as data
augmentation techniques, batch sizes, learning rate
scheduling, and regularisation methods. Additionally,
comparisons between basic and deeper architectures
were made, and Keras Tuner (O’Malley, T., et al,
2019) was employed for fine-tuning
hyperparameters. These steps were taken to optimise
model performance and evaluate how each
modification impacted the results.