Speech Emotion Recognition using MFCC and Hybrid Neural Networks
Youakim Badr, Partha Mukherjee, Sindhu Thumati
2021
Abstract
Speech emotion recognition is a challenging task and feature extraction plays an important role in effectively classifying speech into different emotions. In this paper, we apply traditional feature extraction methods like MFCC for feature extraction from audio files. Instead of using traditional machine learning approaches like SVM to classify audio files, we investigate different neural network architectures. Our baseline model implemented as a convolutional neural network results in 60% classification accuracy. We propose a hybrid neural network architecture based on Convolutional and Long Short-Term Memory (ConvLSTM) networks to capture spatial and sequential information of audio files. Our experimental results show that our ComvLSTM model has achieved an accuracy of 59%. We improved our model with data augmentation techniques and re-trained it with augmented dataset. The classification accuracy achieves 91% for multi-class classification of RAVDESS dataset outperforming the accuracy of state-of-the-art multi-class classification models that used the similar data.
DownloadPaper Citation
in Harvard Style
Badr Y., Mukherjee P. and Thumati S. (2021). Speech Emotion Recognition using MFCC and Hybrid Neural Networks. In Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA; ISBN 978-989-758-534-0, SciTePress, pages 366-373. DOI: 10.5220/0010707400003063
in Bibtex Style
@conference{ncta21,
author={Youakim Badr and Partha Mukherjee and Sindhu Thumati},
title={Speech Emotion Recognition using MFCC and Hybrid Neural Networks},
booktitle={Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA},
year={2021},
pages={366-373},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010707400003063},
isbn={978-989-758-534-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 13th International Joint Conference on Computational Intelligence (IJCCI 2021) - Volume 1: NCTA
TI - Speech Emotion Recognition using MFCC and Hybrid Neural Networks
SN - 978-989-758-534-0
AU - Badr Y.
AU - Mukherjee P.
AU - Thumati S.
PY - 2021
SP - 366
EP - 373
DO - 10.5220/0010707400003063
PB - SciTePress