
Figure 8: Multi face emotion detection deployed on robot
this, we developed and evaluated several deep neu-
ral network models under consistent conditions, care-
fully considering factors such as model size and accu-
racy to ensure compatibility with both personal com-
puters and mobile robots like the Tiago++.
While our system demonstrates strong perfor-
mance, it is important to note the limitations of rely-
ing solely on facial expressions for emotion detection,
particularly in contexts where communication may be
impaired. Emotions are complex and multifaceted,
often requiring the integration of multiple modali-
ties for more accurate recognition. Therefore, future
work will focus on incorporating additional modali-
ties, such as voice, text, gestures, and biosignals, to
enhance the performance and reliability of emotion
recognition systems. Additionally, we will focus on
optimizing large models used in FER tasks to ensure
their efficiency for deployment on the Tiago++ robot,
considering the balance between model size and ac-
curacy.
REFERENCES
Chollet, F. (2017). Xception: Deep learning with depthwise
separable convolutions. In Proceedings of the IEEE
conference on computer vision and pattern recogni-
tion, pages 1251–1258.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei,
L. (2009). Imagenet: A large-scale hierarchical image
database. In 2009 IEEE conference on computer vi-
sion and pattern recognition, pages 248–255.
Dwijayanti, S., Iqbal, M., and Suprapto, B. Y. (2022). Real-
time implementation of face recognition and emotion
recognition in a humanoid robot using a convolutional
neural network. IEEE Access, 10:89876–89886.
El Boudouri, Y. and Bohi, A. (2023). Emonext: an adapted
convnext for facial emotion recognition. In 2023 IEEE
25th International Workshop on Multimedia Signal
Processing (MMSP), pages 1–6. IEEE.
Fard, A. P. and Mahoor, M. H. (2022). Ad-corre: Adap-
tive correlation-based loss for facial expression recog-
nition in the wild. IEEE Access, 10:26756–26768.
Farhat, N., Bohi, A., Letaifa, L. B., and Slama, R. Cg-
mer: a card game-based multimodal dataset for emo-
tion recognition. In Sixteenth International Confer-
ence on Machine Vision (ICMV 2023).
Farzaneh, A. H. and Qi, X. (2021). Facial expression recog-
nition in the wild via deep attentive center loss. In
Proceedings of the IEEE/CVF winter conference on
applications of computer vision, pages 2402–2411.
Goodfellow, I. J., Erhan, D., Carrier, P. L., Courville, A.,
Mirza, M., Hamner, B., Cukierski, W., Tang, Y.,
Thaler, D., Lee, D.-H., et al. (2013). Challenges in
representation learning: A report on three machine
learning contests. In Neural information processing.
20th international conference, ICONIP. Springer.
Gouaillier, D., Hugel, V., Blazevic, P., and Kilner, C.
(2009). Mechatronic design of nao humanoid. In
IEEE International Conference on Robotics and Au-
tomation ICRA.
Han, B., Hu, M., Wang, X., and Ren, F. (2022). A triple-
structure network model based upon mobilenet v1 and
multi-loss function for facial expression recognition.
Symmetry, 14(10):2055.
He, K., Zhang, X., Ren, S., and Sun, J. (2016a). Deep resid-
ual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
He, K., Zhang, X., Ren, S., and Sun, J. (2016b). Iden-
tity mappings in deep residual networks. In Computer
Vision–ECCV 2016: 14th European Conference, Am-
sterdam, The Netherlands, October 11–14, 2016, Pro-
ceedings, Part IV 14, pages 630–645. Springer.
Hirose, M. and Ogawa, K. (2007). Honda humanoid robots
development. Philosophical Transactions of the Royal
Society A: Mathematical, Physical and Engineering
Sciences, 365(1850):11–19.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D.,
Wang, W., Weyand, T., Andreetto, M., and Adam,
H. (2017). Mobilenets: Efficient convolutional neu-
ral networks for mobile vision applications. arXiv
preprint arXiv:1704.04861.
Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger,
K. Q. (2017). Densely connected convolutional net-
works. In Proceedings of the IEEE conference on
computer vision and pattern recognition.
Justo, R., Letaifa, L. B., Olaso, J. M., L
´
opez-Zorrilla, A.,
Develasco, M., V
´
azquez, A., and Torres, M. I. (2021).
A spanish corpus for talking to the elderly. Conversa-
tional Dialogue Systems for the Next Decade.
Justo, R., Letaifa, L. B., Palmero, C., Fraile, E. G., Jo-
hansen, A., Vazquez, A., Cordasco, G., Schlogl, S.,
Ruanova, B. F., Silva, M., Escalera, S., Velasco,
M. D., Laranga, J. T., Esposito, A., Kornes, M., and
Torres, M. I. (2020). Analysis of the interaction be-
tween elderly people and a simulated virtual coach.
Journal of Ambient Intelligence and Humanized Com-
puting, 11:6125–6140.
Karen, S. (2014). Very deep convolutional networks for
large-scale image recognition. arXiv preprint arXiv:
1409.1556.
Kim, M., Lee, D., and Kim, K.-Y. (2015). System archi-
tecture for real-time face detection on analog video
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
98