6 CONCLUSION
The goal of this work is to introduce a novel method
for pathological speech recognition using continuous
speech without the need of a voicing detection or
speech segmentation application. The proposed DL
architecture consists of two parts: an autoencoder part
learns feature representation and a disease-specific
classification.
We demonstrated the applicability of the method
by classifying three different diseases: Parkinson’s
disease, dysphonia related voice disorders and
depression. The method achieved 0.85 for
Parkinson’s disease, 0.86 for dysphonia, and 0.90 for
depression on the test datasets. These classification
accuracies correspond to the classification accuracies
mentioned in the literature. The advantage of this
method is that it is fully data-driven, in the sense that
it does not require special acoustic-phonetic
preprocessing separately for the types of disease to be
recognized. The speech recordings can be directly
given to the deep neural network (using
spectrographic extraction only).
We believe that the applied method in this article
can be used to other diseases as well and can be used
for other languages also.
ACKNOWLEDGEMENTS
Project no. K128568 has been implemented with the
support provided from the National Research,
Development and Innovation Fund of Hungary,
financed under the K_18 funding scheme. The
research was partly funded by the CELSA
(CELSA/18/027) project titled: “Models of
Pathological Speech for Diagnosis and Speech
Recognition”.
REFERENCES
Ali, Z., Talha, M., & Alsulaiman, M., 2017. A practical
approach: Design and implementation of a healthcare
software for screening of dysphonic patients. IEEE
Access, 5, 5844-5857.
Beck, A. T., Steer, R. A., Ball, R. & Ranieri, W. F., 1996.
Comparison of beck depression inventories -IA and -II
in psychiatric outpatients. Journal of Personality
Assessment 67, 588–597.
Cordeiro, H., Meneses, C., & Fonseca, J., 2015. Continuous
speech classification systems for voice pathologies
identification. In Doctoral Conference on Computing,
Electrical and Industrial Systems, pp. 217-224.
Cummins, N., Scherer, S., Krajewski, J., Schnieder, S.,
Epps, J., & Quatieri, T. F., 2015. A review of depression
and suicide risk assessment using speech analysis.
Speech Communication, 71, 10-49.
Dastjerd, N. K., Sert, O. C., Ozyer, T., & Alhajj, R., 2019.
Fuzzy Classification Methods Based Diagnosis of
Parkinson’s disease from Speech Test Cases. Current
Aging Science, 12(2), 100–120.
Filiou, R.-P., Bier, N., Slegers, A., Houzé, B., Belchior, P.,
& Brambati, S. M., 2020. Connected speech assessment
in the early detection of Alzheimer’s disease and mild
cognitive impairment: A scoping review. Aphasiology,
34(6), 723–755.
Guedes, V., Teixeira, F., Oliveira, A., Fernandes, J., Silva,
L., Junior, A., & Teixeira, J. P., 2019. Transfer
Learning with AudioSet to Voice Pathologies
Identification in Continuous Speech. Procedia
Computer Science, 164, 662-669.
Gunduz, H., 2019. Deep learning-based Parkinson’s disease
classification using vocal feature sets. IEEE Access, 7,
115540–115551.
Gupta, V., 2018. Voice disorder detection using long short
term memory (lstm) model. ArXiv Preprint
ArXiv:1812.01779.
Hoehn, M. & Yahr, M. D., 1967. Parkinsonism onset,
progression, and mortality. Neurology 17, pp. 427–427
Jeancolas, L., Petrovska-Delacrétaz, D., Mangone, G.,
Benkelfat, B.-E., Corvol, J.-C., Vidailhet, M., Lehéricy,
S., & Benali, H., 2020. X-vectors: New Quantitative
Biomarkers for Early Parkinson’s Disease Detection
from Speech. ArXiv:2007.03599 [Cs, Eess, q-Bio].
http://arxiv.org/abs/2007.03599
Kaur, S., Aggarwal, H., & Rani, R., 2019. Diagnosis of
Parkinson’s Disease Using Principle Component
Analysis and Deep Learning. Journal of Medical
Imaging and Health Informatics, 9(3), 602–609.
Kaur, S., Aggarwal, H., & Rani, R., 2020. Hyper-parameter
optimization of deep learning model for prediction of
Parkinson’s disease. Machine Vision and Applications,
31(5), 32.
Kim, M. J., Cao, B., An, K., & Wang, J., 2018. Dysarthric
Speech Recognition Using Convolutional LSTM
Neural Network. INTERSPEECH, 2948–2952.
Kiss, G., & Vicsi, K., 2017a. Comparison of read and
spontaneous speech in case of automatic detection of
depression. 2017 8th IEEE International Conference on
Cognitive Infocommunications (CogInfoCom), 213–
218.
Kiss, G., & Vicsi, K., 2017b. Mono-and multi-lingual
depression prediction based on speech processing.
International Journal of Speech Technology, 20(4),
919-935.
Klempíř, O., & Krupička, R., 2018. Machine learning using
speech utterances for Parkinson disease detection.
Lékař a Technika - Clinician and Technology, 48(2),
66–71.
Lam, G., Dongyan, H., & Lin, W., 2019. Context-aware
deep learning for multi-modal depression detection.
ICASSP 2019-2019 IEEE International Conference on