Authors:
Georgia Paraskevopoulou
1
;
Evaggelos Spyrou
2
;
3
and
Stavros Perantonis
3
Affiliations:
1
Department of History & Philosophy of Science, National and Kapodistrian University of Athens, Athens, Greece
;
2
Department of Computer Science and Telecommunications, University of Thessaly, Lamia, Greece
;
3
Institute of Informatics and Telecommunications, National Center for Scientific Research - “Demokritos,” Athens, Greece
Keyword(s):
Emotion Recognition, Convolutional Neural Network, Spectrograms, Data Augmentation.
Abstract:
The recognition of the emotions of humans is crucial for various applications related to human-computer interaction or for understanding the users’ mood in several tasks. Typical machine learning approaches used towards this goal first extract a set of linguistic features from raw data, which are then used to train supervised learning models. Recently, Convolutional Neural Networks (CNNs), which unlike traditional approaches, learn to extract the appropriate features of their inputs, have also been applied as emotion recognition classifiers. In this work, we adopt a CNN architecture that uses spectrograms, extracted from audio signals as inputs and we propose data augmentation techniques to boost the classification performance. The proposed data augmentation approach includes noise addition, shifting of the audio signal, and changing its pitch or its speed. Experimental results indicate that the herein presented approach outperforms previous work which not use augmented data.