MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise
Rubén Fraile, Nicolás Sáenz-Lechón, Juan I. Godino-Llorente, Víctor Osma-Ruiz, Corinne Fredouille
2009
Abstract
Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection fo laryngeal pathologies. Performance assessment of such techniques reveals that classification success rates over 90% are achievable. Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyses the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone the distortion of an analogue communications channel, namely the phone channel. Such channel is modeled as a concatenation of linear effects. It is shown that while the overall performance of the system is degraded, success rates in the range of 80% can still be achieved. This study also shows that the performance degradation is mainly due to band limitation and noise addition.
References
- (1994). Voice disorders database v.1. CD-ROM. Massachusetts Eye and Ear Infirmary.
- (1998). Transmission characteristics of national networks. Series G: Transmission Systems and Media, Digital Systems and Networks Rec. G.120 (12/98), ITU-T.
- Baken, R. J. and Orlikoff, R. F. (2000). Clinical Measurement of Speech and Voice. Singular Publishers, San Diego (USA).
- Bimbot, F., Bonastre, J. F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska, D., and Reynolds, D. A. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 2004(4):430-451.
- Boyanov, B. and Hadjitodorov, S. (1997). Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. IEEE Engineering in Medicine and Biology, 16(4):74-82.
- Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28(4):357-366.
- Deller, J. R., Proakis, J. G., and Hansen, J. H. L. (1993). Discrete-time processing of speech signals. Macmillan Publishing Company, New York (USA).
- Dimolitsas, S. and Gunn, J. E. (1988). Modular, off line, full duplex telephone channel simulator for high speed data transceiver evaluation. IEE Proceedings, 135(2):155-160.
- Fraile, R., Godino-Llorente, J. I., Sáenz-Lechón, N., OsmaRuiz, V., and Gomez-Vilda, P. (2007). Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection. In Proceedings of the 8th INTERSPEECH Conference (INTERSPEECH 2007), pages 1218-1221.
- Fraile, R., Godino-Llorente, J. I., Sáenz-Lechón, N., OsmaRuiz, V., and Gómez-Vilda, P. (2008a). Use of cepstrum-based parameters for automatic pathology detection on speech. Analysis of performance and theoretical justification. In Proceedings of Biosignals 2008, volume 1, pages 85-91.
- Godino-Llorente, J. I., Gomez-Vilda, P., and BlancoVelasco, M. (2006). Dimensionality reduction of a pathological voice quality assessment system based on gaussian mixture models and short-term cepstral parameters. IEEE Transactions on Biomedical Engineering, 53(10):1943-1953.
- Haykin, S. (1994). Neural networks: A comprehensive foundation. Macmillan, New York.
- Jamieson, D. G., Parsa, V., Price, M. C., and Till, J. (2002). Interaction of speech coders and atypical speech ii: Effects on speech quality. Journal of Speech, Language and Hearing Research, 45:689-699.
- Martin, A. F., Doddington, G. R., Kamm, T., Ordowski, M., and Przybocki, M. A. (1997). The DET curve in assessment of detection task performance. In Proceedings of Eurospeech 7897, volume IV, pages 1895-1898, Rhodes, Crete.
- Moran, R. J., Reilly, R. B., de Chazal, P., and Lacy, P. D. (2006). Telephony-based voice pathology assessment using automated speech analysis. IEEE Transactions on Biomedical Engineering, 53(3):468-477.
- Murphy, P. J. and Akande, O. O. (2005). Quantification of glottal and voiced speech harmonics-to-noise ratios using cepstral-based estimation. In Proceedings of the 3rd International Conference on Non-Linear Speech Processing (NOLISP'05), pages 224-232.
- Parsa, V. and Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language and Hearing Research, 43(2):469-485.
- Pouchoulin, G., Fredouille, C., Bonastre, J. F., Ghio, A., and Giovanni, A. (2007). Frequency study for the characterization of the dysphonic voices. In Proceedings of the 8th INTERSPEECH Conference (INTERSPEECH 2007), pages 1198-1201.
- Reynolds, D. A., Zissman, M. A., Quatieri, T. F., O'Leary, G. C., and Carlson, B. A. (1995). The effects of telephone transmission degradations on speaker recognition performance. In Proceedings of ICASSP 7895, volume 1, pages 329-332, Detroit, MI, USA.
- Sdersten, M. and Lindhe, C. (2007). Voice ergonomics - an overview of recent research. In Berlin, C. and Bligard, L. O., editors, Proceedings of the 39th Nordic Ergonomics Society Conference.
- TM Alliance Team (2004). Telemedicine 2010: Visions for a personal medical network. Technical Report BR-29, ESA Publications Division.
- Umapathy, K., Krishnan, S., Parsa, V., and Jamieson, D. G. (2005). Discrimination of pathological voices using a time-frequency approach. IEEE Transactions on Biomedical Engineering, 52(3):421-430.
Paper Citation
in Harvard Style
Fraile R., Sáenz-Lechón N., Godino-Llorente J., Osma-Ruiz V. and Fredouille C. (2009). MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 41-48. DOI: 10.5220/0001534200410048
in Bibtex Style
@conference{biosignals09,
author={Rubén Fraile and Nicolás Sáenz-Lechón and Juan I. Godino-Llorente and Víctor Osma-Ruiz and Corinne Fredouille},
title={MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={41-48},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001534200410048},
isbn={978-989-8111-65-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise
SN - 978-989-8111-65-4
AU - Fraile R.
AU - Sáenz-Lechón N.
AU - Godino-Llorente J.
AU - Osma-Ruiz V.
AU - Fredouille C.
PY - 2009
SP - 41
EP - 48
DO - 10.5220/0001534200410048