The detailed statistics presented in Figure 8
meticulously evaluate the accuracy of the forecast
outcomes. It's noteworthy that the training efficacy of
the network model is perceptibly impacted by the
quantity of training datasets. The category of disgust
has the smallest dataset, which explains the reason
why it has the most notable disparity in terms of
Precision, Recall, and F1-Score. The two classes that
showed the most significant discrepancies in the
effect of model predictions on the test set are the
fearful and happy classes. Based on this outcome, it
can be deduced that some expressions exhibit notable
variations in prediction outcomes as a consequence of
their intricacy. Hence, investigating the network
structure with the explicit goal of identifying a certain
kind of expression might be regarded as a prospective
area of study.
To validate the improved classification
performance of the proposed network, this research
also conducts a comparative analysis of many popular
classification neural networks, including
InceptionNet (Szegedy, 2017) and MobileNet
(Howard, 2017). Table 3 demonstrates that when
using the same training settings and environment,
RARN outperforms other models in terms of
obtaining convergence and producing a final model
with greater accuracy. RARN enhances the accuracy
of the model's recognition rate while only requiring a
minimal amount of parameters. This demonstrates
that RARN guarantees the performance of the
network while also assuring the benefits of
operational efficiency. RARN achieved an Accuracy
of 57.51%, with gains of 0.54% and 21.17%
compared to InceptionNet and MobieNet
Table 3: Comparison of Accuracy.
Network Accuracy
RARN 57.51%
InceptionNet 56.47%
MobielNet 36.34%
5 CONCLUSIONS
This study presents a comprehensive facial
expression categorization technique that harnesses
attention mechanisms and deep learning. The
approach integrates a multi-scale fusion module and
an angle-sensitive spatial attention module to drive
the classification function. While the multi-scale
fusion module captures both global and specific
characteristics of the input image, the angle-sensitive
spatial attention module enhances feature mapping by
incorporating angle information. Experimental
results showcase the method's superior recognition
rate and substantial improvement in facial expression
categorization. Future research endeavours will delve
into refining network structures, exploring
parameters like convolution core size and step size,
and further defining network levels. Additionally, the
inclusion of more extensive datasets will enhance the
evaluation of the network's performance.
REFERENCES
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual
learning for image recognition. In Proceedings of the
IEEE conference on computer vision and pattern
recognition (pp. 770-778).
Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017,
February). Inception-v4, inception-resnet and the
impact of residual connections on learning. In
Proceedings of the AAAI Conference on artificial
intelligence (Vol. 31, No. 1).
Zhou, J., Xiong, Y., Chiu, C., Liu, F., & Gong, X. (2023).
Sat: Size-aware transformer for 3d point cloud semantic
segmentation. arXiv preprint arXiv:2301.06869.
Ekman, P., Sorenson, E. R., & Friesen, W. V. (1969). Pan-
cultural elements in facial displays of emotion. Science,
164(3875), 86-88.
Arora, S., Bhaskara, A., Ge, R., & Ma, T. (2014, January).
Provable bounds for learning some deep
representations. In International conference on machine
learning (pp. 584-592). PMLR.
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., ...
& Tang, X. (2017). Residual attention network for
image classification. In Proceedings of the IEEE
conference on computer vision and pattern recognition
(pp. 3156-3164).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., ... & Rabinovich, A. (2015). Going
deeper with convolutions. In Proceedings of the IEEE
conf. on Computer Vision and Pattern Recognition (pp.
1-9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna,
Z. (2016). Rethinking the inception architecture for
computer vision. In Proceedings of the IEEE conf. on
computer vision and pattern recognition (pp. 2818-
2826).
Zhang, X., Liu, C., Yang, D., Song, T., Ye, Y., Li, K., &
Song, Y. (2023). Rfaconv: Innovating spatital attention
and standard convolutional operation. arXiv preprint
arXiv:2304.03198.
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A.,
Mirza, M., Hamner, B., ... & Bengio, Y. (2013).
Challenges in representation learning: A report on three
machine learning contests. In Neural Information
Processing: 20th International Conference, ICONIP