ATFSC: Audio-Text Fusion for Sentiment Classification

Aicha Nouisser, Nouha Khediri, Monji Kherallah, Faiza Charfi

2025

Abstract

The diversity of human expressions and the complexity of emotions are specific challenges related to sentiment analysis from text and speech data. Models must consider not only text but also nuances of intonation and emotions expressed by voice. To address these challenges, we created a bimodal sentiment analysis model named ATFSC, that organizes emotions based on textual and audio information. It fuses textual and audio information from conversations, providing a more robust analysis of sentiments, whether negative, neutral, or positive. Key features include the use of transfer learning with a pre-trained BERT model for text processing, a CNN-based audio feature extractor for audio processing, and flexible preprocessing capabilities that support different dataset formats. An attention mechanism was employed to perform a bimodal fusion of audio and text features, which led to a notable performance optimization. As a result, we observed a performance amelioration in the accuracy values such as 64.61%, 69%, 72%, 81.36% on different datasets respectively IEMOCAP, SLUE, MELD, and CMU-MOSI.

Download


Paper Citation


in Harvard Style

Nouisser A., Khediri N., Kherallah M. and Charfi F. (2025). ATFSC: Audio-Text Fusion for Sentiment Classification. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 750-757. DOI: 10.5220/0013178300003890


in Bibtex Style

@conference{icaart25,
author={Aicha Nouisser and Nouha Khediri and Monji Kherallah and Faiza Charfi},
title={ATFSC: Audio-Text Fusion for Sentiment Classification},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={750-757},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013178300003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - ATFSC: Audio-Text Fusion for Sentiment Classification
SN - 978-989-758-737-5
AU - Nouisser A.
AU - Khediri N.
AU - Kherallah M.
AU - Charfi F.
PY - 2025
SP - 750
EP - 757
DO - 10.5220/0013178300003890
PB - SciTePress