3.3 Discussion
Although several methodologies was tried, this article
only contain the best results achieved.
The results shown that the neural network is a
promising tool for voice pathology classification. In
the work presented in (Teixeira F,, Fernandes J.,
Guedes V., Junior A., & Teixeira J. P., 2018) the
Support Vector Machines (SVM’s) was used to
classify between control/pathologic with the same
dataset with best results about 70% accuracy.
Comparing with present results of the first level NN
an accuracy of about 73% was achieved,
demonstrating the improvements introduced using
ANN.
Guedes et al, 2019 developed a system to classify
between the same 4 classes but using parameters from
continuous speech and using transfer-learning
technics to do the classification. Very similar results
was achieved, with F1-score of 40% for classify
between four categories.
No separation by gender was used because for
Laryngitis Chronica it was reported in (Teixeira J. P.
et al., 2018) no gender difference for the relative
jitter, relative and absolute shimmer, HNR, NHR and
Autocorrelation. Anyhow, other conditions like
Dysphonia and Vocal Fold Paralysis was used, and
maybe some gender difference can exist for these
subjects. Therefore, it is recommended to experiment
a gender separation in future work.
4 CONCLUSIONS
This article describes the experience of using neural
networks for diagnosis between healthy subjects and
subjects with one of the three pathologies under study
(Laryngitis Chronica, Dysphonia or Vocal Fold
Paralysis). The parameters used in this analysis were
extracted from sustained vowels or from a sentence.
A Deep NN was developed in two levels to
classify between 4 classes.
The best results in the first level between
control/pathologic subjects were an accuracy of 73%.
The best results between the 4 classes in the
second level was an accuracy of about 40%.
The best results, in some cases, were not obtained
with the same parameters group, however generally
de parameters of group I(a) demonstrate the best
results in higher number of cases. Therefore, for the
uniformity reasons it was considered that group I(a)
is the best group of parameters experimented. These
parameters are relative jitter, relative shimmer and
HNR.
As a final conclusion, the accuracy of about 40%
to make the identification between healthy subjects
and 3 pathologies still below the requirements to
became a real application. These results demand more
research on this type of classification, experimenting
different models of classification, different type of
features, more subjects and maybe gender separation.
ACKNOWLEDGEMENTS
This work has been supported by FCT – Fundação
para a Ciência e Tecnologia within the Project Scope:
UIDB/5757/2020.
REFERENCES
Barry, W.J., Pützer, M. Saarbrücken Voice Database,
Institute of Phonetics, Univ. of Saarland, http://www.
stimmdatenbank.coli.unisaarland.de/
Boersma, P., 1993. “Accurate short-term analysis of the
fundamental frequency and the harmonic-to-noise ratio
of a sample sound,” IFA Proceeding, vol. 17, pp. 97-
110.
Fernandes
,
J., Silva, L., Teixeira, F., Guedes
,
V., Santos, J.
& Teixeira, J. P., 2019. Parameters for Vocal Acoustic
Analysis - Cured Database. In Procedia Computer
Science – Elsevier.
Fernandes, J., Teixeira, F., Guedes, V. Junior, A. &
Teixeira, J. P. , 2018. “Harmonic to Noise Ratio
Measurement - Selection of Window and Length”,
Procedia Computer Science - Elsevier. Volume 138,
Pages 280-285.
Guedes, V.; Teixeira, F.; Oliveira, A.; Fernandes, J.; Silva,
L.; Junior, A. & Teixeira, J. P., 2019. Transfer Learning
with AudioSet to Voice Pathologies Identification in
Continuous Speech. Procedia Computer Science –
Elsevier.
Guimarães, I. (2004). Os Problemas de Voz nos
Professores: Prevalência, Causas, Efeitos e Formas de
Prevenção. 22.
Huche, F., & Allali, A. (2005). A Voz - Patologia vocal de
origem orgânica. (5th ed., Vol. 3; A. Editora, Ed.).
Kumar, V., Abbas, A. K., Fausto, N., & Aster, J. C. (2010).
Robbins and Cotran Patologia – bases patológicas das
doenças. (8 ed). Rio de Janeiro: Brasil: Elsevier Editora
Ltda.
Lindasalwa Muda, M. B., 2010. Voice Recognition
Algorithms using Mel Frequency Cepstral Coefficient
(MFCC) and Dynamic Time Warping (DTW)
Techniques. J. O. COMPUTING, Ed.
Marquardt, D. (1963). An Algorithm for Least-Squares
Estimation of Nonlinear Parameters. SIAM Journal on
Applied Mathematics, 11(2), 431-441.