Authors:
Marta Babel Guerreiro
1
;
Catia Cepeda
1
;
Joana Sousa
2
;
3
;
Carolina Maio
2
;
João Ferreira
2
and
Hugo Gamboa
1
Affiliations:
1
LIBPhys (Laboratory for Instrumentation, Biomedical Engineering and Radiation Physics), Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, Caparica, Portugal
;
2
NOS Inovação, Lisboa, Portugal
;
3
Bold International, Lisboa, Portugal
Keyword(s):
Gender, Age, Emotion, Machine Learning, Voice Signal.
Abstract:
Voice signals are a rich source of personal information, leading to the main objective of the present work: study the possibility of predicting gender, age, and emotional valence through short voice interactions with a mobile device (a smartphone or remote control), using machine learning algorithms. For that, data acquisition was carried out to create a Portuguese dataset (consisting in 156 samples). Testing Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Random Forest (RF) classifiers and using features extracted from the audio, the gender recognition model achieved an accuracy of 87.8%, the age group recognition model achieved an accuracy of 67.6%, and an accuracy of 94.6% was reached for the emotion model. The SVM algorithm produced the best results for all models. The results show that it is possible to predict not only someone’s specific personal characteristics but also its emotional state from voice signals. Future work should be done in order to improve these mo
dels by increasing the dataset.
(More)