Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning

Imen Baklouti, Olfa Ben Ahmed, Christine Fernandez-Maloigne

2024

Abstract

Speech Emotion Recognition (SER) plays an important role in several human-computer interaction-based applications. During the last decade, SER systems in a single language have achieved great progress through Deep Learning (DL) approaches. However, SER is still a challenge in real-world applications, especially with low-resource languages. Indeed, SER suffers from the limited availability of labeled training data in the speech corpora to train an efficient prediction model from scratch. Yet, due to the domain shift between source and target data distributions traditional transfer learning methods often fail to transfer emotional knowledge from one language (source) to (target) to another. In this paper, we propose a simple yet effective approach for Cross-Lingual speech emotion recognition using supervised domain adaptation. The proposed method is based on 2D Mel-Spectrogram images as features for model training from source data. Then, a transfer learning method with domain adaptation is proposed in order to reduce the domain shift between source and target data in the latent space during model fine-tuning. We conduct experiments through different tasks on three different SER datasets. The proposed method has been evaluated on different transfer learning tasks namely for low-resource scenarios using the IEMOCAP, RAVDESS and EmoDB datasets. Obtained results demonstrate that the proposed method achieved competitive classification performance in comparison with the classical transfer learning method and with recent state-of-the-art SER-based domain adaptation works.

Download


Paper Citation


in Harvard Style

Baklouti I., Ben Ahmed O. and Fernandez-Maloigne C. (2024). Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 118-128. DOI: 10.5220/0012788100003756


in Bibtex Style

@conference{data24,
author={Imen Baklouti and Olfa Ben Ahmed and Christine Fernandez-Maloigne},
title={Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={118-128},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012788100003756},
isbn={978-989-758-707-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning
SN - 978-989-758-707-8
AU - Baklouti I.
AU - Ben Ahmed O.
AU - Fernandez-Maloigne C.
PY - 2024
SP - 118
EP - 128
DO - 10.5220/0012788100003756
PB - SciTePress