Emotion Recognition from Speech: A Survey

Georgios Drakopoulos, George Pikramenos, Evaggelos Spyrou, Stavros Perantonis


Emotion recognition from speech signals is an important field in its own right as well as a mainstay of many multimodal sentiment analysis systems. The latter may as well include a broad spectrum of modalities which are strongly associated with consciously or subconsciously communicating human emotional state such as visual cues, gestures, body postures, gait, or facial expressions. Typically, emotion discovery from speech signals not only requires considerably less computational complexity than other modalities, but also at the same time in the overwhelming majority of studies the inclusion of speech modality increases the accuracy of the overall emotion estimation process. The principal algorithmic cornerstones of emotion estimation from speech signals are Hidden Markov Models, time series modeling, cepstrum processing, and deep learning methodologies, the latter two being prime examples of higher order data processing. Additionally, the most known datasets which serve as emotion recognition benchmarks are described.


Paper Citation