MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise

Rubén Fraile, Nicolás Sáenz-Lechón, Juan I. Godino-Llorente, Víctor Osma-Ruiz, Corinne Fredouille

Abstract

Advances in speech signal analysis during the last decade have allowed the development of automatic algorithms for a non-invasive detection fo laryngeal pathologies. Performance assessment of such techniques reveals that classification success rates over 90% are achievable. Bearing in mind the extension of these automatic methods to remote diagnosis scenarios, this paper analyses the performance of a pathology detector based on Mel Frequency Cepstral Coefficients when the speech signal has undergone the distortion of an analogue communications channel, namely the phone channel. Such channel is modeled as a concatenation of linear effects. It is shown that while the overall performance of the system is degraded, success rates in the range of 80% can still be achieved. This study also shows that the performance degradation is mainly due to band limitation and noise addition.

References

  1. (1994). Voice disorders database v.1. CD-ROM. Massachusetts Eye and Ear Infirmary.
  2. (1998). Transmission characteristics of national networks. Series G: Transmission Systems and Media, Digital Systems and Networks Rec. G.120 (12/98), ITU-T.
  3. Baken, R. J. and Orlikoff, R. F. (2000). Clinical Measurement of Speech and Voice. Singular Publishers, San Diego (USA).
  4. Bimbot, F., Bonastre, J. F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska, D., and Reynolds, D. A. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 2004(4):430-451.
  5. Boyanov, B. and Hadjitodorov, S. (1997). Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. IEEE Engineering in Medicine and Biology, 16(4):74-82.
  6. Davis, S. B. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-28(4):357-366.
  7. Deller, J. R., Proakis, J. G., and Hansen, J. H. L. (1993). Discrete-time processing of speech signals. Macmillan Publishing Company, New York (USA).
  8. Dimolitsas, S. and Gunn, J. E. (1988). Modular, off line, full duplex telephone channel simulator for high speed data transceiver evaluation. IEE Proceedings, 135(2):155-160.
  9. Fraile, R., Godino-Llorente, J. I., Sáenz-Lechón, N., OsmaRuiz, V., and Gomez-Vilda, P. (2007). Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection. In Proceedings of the 8th INTERSPEECH Conference (INTERSPEECH 2007), pages 1218-1221.
  10. Fraile, R., Godino-Llorente, J. I., Sáenz-Lechón, N., OsmaRuiz, V., and Gómez-Vilda, P. (2008a). Use of cepstrum-based parameters for automatic pathology detection on speech. Analysis of performance and theoretical justification. In Proceedings of Biosignals 2008, volume 1, pages 85-91.
  11. Godino-Llorente, J. I., Gomez-Vilda, P., and BlancoVelasco, M. (2006). Dimensionality reduction of a pathological voice quality assessment system based on gaussian mixture models and short-term cepstral parameters. IEEE Transactions on Biomedical Engineering, 53(10):1943-1953.
  12. Haykin, S. (1994). Neural networks: A comprehensive foundation. Macmillan, New York.
  13. Jamieson, D. G., Parsa, V., Price, M. C., and Till, J. (2002). Interaction of speech coders and atypical speech ii: Effects on speech quality. Journal of Speech, Language and Hearing Research, 45:689-699.
  14. Martin, A. F., Doddington, G. R., Kamm, T., Ordowski, M., and Przybocki, M. A. (1997). The DET curve in assessment of detection task performance. In Proceedings of Eurospeech 7897, volume IV, pages 1895-1898, Rhodes, Crete.
  15. Moran, R. J., Reilly, R. B., de Chazal, P., and Lacy, P. D. (2006). Telephony-based voice pathology assessment using automated speech analysis. IEEE Transactions on Biomedical Engineering, 53(3):468-477.
  16. Murphy, P. J. and Akande, O. O. (2005). Quantification of glottal and voiced speech harmonics-to-noise ratios using cepstral-based estimation. In Proceedings of the 3rd International Conference on Non-Linear Speech Processing (NOLISP'05), pages 224-232.
  17. Parsa, V. and Jamieson, D. G. (2000). Identification of pathological voices using glottal noise measures. Journal of Speech, Language and Hearing Research, 43(2):469-485.
  18. Pouchoulin, G., Fredouille, C., Bonastre, J. F., Ghio, A., and Giovanni, A. (2007). Frequency study for the characterization of the dysphonic voices. In Proceedings of the 8th INTERSPEECH Conference (INTERSPEECH 2007), pages 1198-1201.
  19. Reynolds, D. A., Zissman, M. A., Quatieri, T. F., O'Leary, G. C., and Carlson, B. A. (1995). The effects of telephone transmission degradations on speaker recognition performance. In Proceedings of ICASSP 7895, volume 1, pages 329-332, Detroit, MI, USA.
  20. Sdersten, M. and Lindhe, C. (2007). Voice ergonomics - an overview of recent research. In Berlin, C. and Bligard, L. O., editors, Proceedings of the 39th Nordic Ergonomics Society Conference.
  21. TM Alliance Team (2004). Telemedicine 2010: Visions for a personal medical network. Technical Report BR-29, ESA Publications Division.
  22. Umapathy, K., Krishnan, S., Parsa, V., and Jamieson, D. G. (2005). Discrimination of pathological voices using a time-frequency approach. IEEE Transactions on Biomedical Engineering, 52(3):421-430.
Download


Paper Citation


in Harvard Style

Fraile R., Sáenz-Lechón N., Godino-Llorente J., Osma-Ruiz V. and Fredouille C. (2009). MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 41-48. DOI: 10.5220/0001534200410048


in Bibtex Style

@conference{biosignals09,
author={Rubén Fraile and Nicolás Sáenz-Lechón and Juan I. Godino-Llorente and Víctor Osma-Ruiz and Corinne Fredouille},
title={MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={41-48},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001534200410048},
isbn={978-989-8111-65-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - MFCC-BASED REMOTE PATHOLOGY DETECTION ON SPEECH TRANSMITTED THROUGH THE TELEPHONE CHANNEL - Impact of Linear Distortions: Band Limitation, Frequency Response and Noise
SN - 978-989-8111-65-4
AU - Fraile R.
AU - Sáenz-Lechón N.
AU - Godino-Llorente J.
AU - Osma-Ruiz V.
AU - Fredouille C.
PY - 2009
SP - 41
EP - 48
DO - 10.5220/0001534200410048