
Dvoynikova, A. and Karpov, A. (2023). Bimodal sentiment
and emotion classification with multi-head attention
fusion of acoustic and linguistic information. In Pro-
ceedings of the International Conference “Dialogue,
volume 2023.
Firdaus, M., Chauhan, H., Ekbal, A., and Bhattacharyya,
P. (2020). Meisd: A multimodal multi-label emotion,
intensity and sentiment dialogue dataset for emotion
recognition and sentiment analysis in conversations.
In Proceedings of the 28th international conference
on computational linguistics, pages 4441–4453.
Jim, J. R., Talukder, M. A. R., Malakar, P., Kabir, M. M.,
Nur, K., and Mridha, M. (2024). Recent advance-
ments and challenges of nlp-based sentiment analysis:
A state-of-the-art review. Natural Language Process-
ing Journal, page 100059.
Khediri, N., Ammar, M. B., and Kherallah, M. (2017). To-
wards an online emotional recognition system for in-
telligent tutoring environment. In ACIT’2017 The In-
ternational Arab Conference on Information Technol-
ogy Yassmine Hammamet, pages 22–24.
Khediri, N., Ben Ammar, M., and Kherallah, M. (2022). A
new deep learning fusion approach for emotion recog-
nition based on face and text. In International Confer-
ence on Computational Collective Intelligence, pages
75–81. Springer.
Khediri, N., Ben Ammar, M., and Kherallah, M. (2024).
A real-time multimodal intelligent tutoring emotion
recognition system (miters). Multimedia Tools and
Applications, 83(19):57759–57783.
Lee, J. and Toutanova, K. (2018). Pre-training of deep
bidirectional transformers for language understand-
ing. arXiv preprint arXiv:1810.04805, 3(8).
Poria, S., Cambria, E., Howard, N., Huang, G.-B., and Hus-
sain, A. (2016). Fusing audio, visual and textual clues
for sentiment analysis from multimodal content. Neu-
rocomputing, 174:50–59.
Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria,
E., and Mihalcea, R. (2018). Meld: A multimodal
multi-party dataset for emotion recognition in conver-
sations. arXiv preprint arXiv:1810.02508.
Sarker, I. H. (2021). Deep learning: a comprehensive
overview on techniques, taxonomy, applications and
research directions. SN computer science, 2(6):420.
Sebastian, J., Pierucci, P., et al. (2019). Fusion tech-
niques for utterance-level emotion recognition com-
bining speech and transcripts. In Interspeech, pages
51–55.
Shon, S., Pasad, A., Wu, F., Brusco, P., Artzi, Y., Livescu,
K., and Han, K. J. (2022). Slue: New benchmark
tasks for spoken language understanding evaluation
on natural speech. In ICASSP 2022-2022 IEEE Inter-
national Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 7927–7931. IEEE.
Voloshina, T. and Makhnytkina, O. (2023). Multimodal
emotion recognition and sentiment analysis using
masked attention and multimodal interaction. In 2023
33rd Conference of Open Innovations Association
(FRUCT), pages 309–317. IEEE.
Wu, Z., Gong, Z., Koo, J., and Hirschberg, J. (2024). Mul-
timodal multi-loss fusion network for sentiment anal-
ysis. In Proceedings of the 2024 Conference of the
North American Chapter of the Association for Com-
putational Linguistics: Human Language Technolo-
gies (Volume 1: Long Papers), pages 3588–3602.
Ye, W. and Fan, X. (2014). Bimodal emotion recognition
from speech and text. International Journal of Ad-
vanced Computer Science and Applications, 5(2).
Yoon, S., Byun, S., and Jung, K. (2018). Multimodal speech
emotion recognition using audio and text. In 2018
IEEE spoken language technology workshop (SLT),
pages 112–118. IEEE.
ATFSC: Audio-Text Fusion for Sentiment Classification
757