Tunisian Dialect Speech Corpus: Construction and Emotion Annotation

Latifa Iben Nasr, Abir Masmoudi, Lamia Hadrich Belguith

2025

Abstract

Speech Emotion Recognition (SER) using Natural Language Processing (NLP) for underrepresented dialects faces significant challenges due to the lack of annotated corpora. This research addresses this issue by constructing and annotating SERTUS (Speech Emotion Recognition in TUnisian Spontaneous speech), a novel corpus of spontaneous speech in the Tunisian Dialect (TD), collected from various domains such as sports, politics, and culture. SERTUS includes both registers of TD: the popular (familiar) register and the intellectual register, capturing a diverse range of emotions in spontaneous settings and natural interactions across different regions of Tunisia. Our methodology uses a categorical approach to emotion annotation and employs inter-annotator agreement measures to ensure the reliability and consistency of the annotations. The results demonstrate a high level of agreement among annotators, indicating the robustness of the annotation process. The study’s core contribution lies in its comprehensive and rigorous approach to the development of a dataset of spontaneous emotional speech in this dialect. The constructed corpus has significant potential applications in various fields, such as human-computer interaction, mental health monitoring, call center analytics, and social robotics. It also facilitates the development of more accurate and culturally nuanced SER systems. This work contributes to existing research by providing a high-quality annotated corpus while emphasizing the importance of including underrepresented dialects in NLP research.

Download


Paper Citation


in Harvard Style

Nasr L., Masmoudi A. and Belguith L. (2025). Tunisian Dialect Speech Corpus: Construction and Emotion Annotation. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART; ISBN 978-989-758-737-5, SciTePress, pages 360-367. DOI: 10.5220/0013134000003890


in Bibtex Style

@conference{icaart25,
author={Latifa Nasr and Abir Masmoudi and Lamia Belguith},
title={Tunisian Dialect Speech Corpus: Construction and Emotion Annotation},
booktitle={Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART},
year={2025},
pages={360-367},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013134000003890},
isbn={978-989-758-737-5},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 17th International Conference on Agents and Artificial Intelligence - Volume 3: ICAART
TI - Tunisian Dialect Speech Corpus: Construction and Emotion Annotation
SN - 978-989-758-737-5
AU - Nasr L.
AU - Masmoudi A.
AU - Belguith L.
PY - 2025
SP - 360
EP - 367
DO - 10.5220/0013134000003890
PB - SciTePress