Authors:
Asma Bougrine
1
;
Philippe Ravier
1
;
Abdenour Hacine-Gharbi
2
and
Hanane Ouachour
1
Affiliations:
1
PRISME Laboratory, University of Orleans, 12 Rue de Blois, 45067 Orleans, France
;
2
LMSE Laboratory, University of Bordj Bou Arréridj, Elanasser, 34030 Bordj Bou Arréridj, Algeria
Keyword(s):
Speech Injunction Classification, Massive Wild Oral Corpus, Prosodic Features, Static and Dynamic Features, SVM, K-NN, Long Short Term Memory (LSTM).
Abstract:
The classification of injunction in french oral speech is a difficult task since no standard linguistic structure is known in the french language. Thus, prosodic features of the speech could be permitted indicators for this task, especially the logarithmic energy. Our aim is to validate the predominance of the log energy prosodic feature by using conventional classifiers such as SVM or K-NN. Second, we intend to improve the classification rates by using a deep LSTM recurrent network. When applied on the RAVIOLI database, the log energy feature showed indeed the best classification rates (CR) for all classifiers with CR = 82% for SVM and CR = 71.42% for K-NN. When applying the LSTM network on our data, the CR reached a not better value of 79.49% by using the log energy feature alone. More surprisingly, the CR significantly increased to 96.15% by using the 6 prosodic features. We conclude that deep learning methods need as much data as possible for reaching high performance, even the l
ess informative ones, especially when the dataset is small. The counterpart of deep learning methods remains the difficulty of optimal parameters tuning.
(More)