Authors:
Anwer Slimi
1
;
2
;
Henri Nicolas
1
and
Mounir Zrigui
2
Affiliations:
1
Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800, F-33400 Talence, France
;
2
University of Monastir, RLANTIS Laboratory LR 18ES15, Monastir, Tunisia
Keyword(s):
Connectionist Temporal Classification, Emotion Recognition, Neural Networks, Spectrograms.
Abstract:
In the past few years, a lot of research has been conducted to predict emotions from speech. The majority of the studies aim to recognize emotions from pre-segmented data with one global label (category). Despite the fact that emotional states are constantly changing and evolving across time, the emotion change has gotten less attention. Mainly, the exiting studies focus either on predicting arousal-valence values or on detecting the instant of the emotion change. To the best of the authors knowledge, this is the first paper that addresses the emotion category change (i.e., predicts the classes existing in a signal such as angry, happy, sad etc.). As a result of that, we propose a model based on the Connectionist Temporal Classification (CTC) loss, along with new evaluation metrics.