Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning
Imen Baklouti, Olfa Ben Ahmed, Christine Fernandez-Maloigne
2024
Abstract
Speech Emotion Recognition (SER) plays an important role in several human-computer interaction-based applications. During the last decade, SER systems in a single language have achieved great progress through Deep Learning (DL) approaches. However, SER is still a challenge in real-world applications, especially with low-resource languages. Indeed, SER suffers from the limited availability of labeled training data in the speech corpora to train an efficient prediction model from scratch. Yet, due to the domain shift between source and target data distributions traditional transfer learning methods often fail to transfer emotional knowledge from one language (source) to (target) to another. In this paper, we propose a simple yet effective approach for Cross-Lingual speech emotion recognition using supervised domain adaptation. The proposed method is based on 2D Mel-Spectrogram images as features for model training from source data. Then, a transfer learning method with domain adaptation is proposed in order to reduce the domain shift between source and target data in the latent space during model fine-tuning. We conduct experiments through different tasks on three different SER datasets. The proposed method has been evaluated on different transfer learning tasks namely for low-resource scenarios using the IEMOCAP, RAVDESS and EmoDB datasets. Obtained results demonstrate that the proposed method achieved competitive classification performance in comparison with the classical transfer learning method and with recent state-of-the-art SER-based domain adaptation works.
DownloadPaper Citation
in Harvard Style
Baklouti I., Ben Ahmed O. and Fernandez-Maloigne C. (2024). Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning. In Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA; ISBN 978-989-758-707-8, SciTePress, pages 118-128. DOI: 10.5220/0012788100003756
in Bibtex Style
@conference{data24,
author={Imen Baklouti and Olfa Ben Ahmed and Christine Fernandez-Maloigne},
title={Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning},
booktitle={Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA},
year={2024},
pages={118-128},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012788100003756},
isbn={978-989-758-707-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Conference on Data Science, Technology and Applications - Volume 1: DATA
TI - Cross-Lingual Low-Resources Speech Emotion Recognition with Domain Adaptive Transfer Learning
SN - 978-989-758-707-8
AU - Baklouti I.
AU - Ben Ahmed O.
AU - Fernandez-Maloigne C.
PY - 2024
SP - 118
EP - 128
DO - 10.5220/0012788100003756
PB - SciTePress