DEVICE FOR PROSODIC SPEECH RESTORATION - A Multi-Resolution Approach for Glottal Excitation Restoration

O. Schleusing, R. Vetter, Ph. Renevey, J.-M. Vesin, V. Schweizer

Abstract

This paper proposes a novel device for the restoration of authentic characteristics in pathological speech uttered by subjects with laryngeal disorders. The device acquires and analyzes the original speech signal and reconstructs a speech signal with improved, healthy-like features in real-time. The pathological excitation is replaced by concatenation of randomly chosen healthy reference patterns. To restore authentic features, intervals between subsequent reference patterns are obtained through a multi-resolution approach. Short-term pitch variability is reproduced through a statistical variation model. Middle-term pitch variability exploits the correlation at the middle-term time scale between pitch and signal envelope. Long-term variability is obtained through adaptive wavetable oscillators; a novel, reliable and computationally efficient method. Performance was assessed with respect to two authentic features, namely breathiness and prosody. Preliminary results have shown that breathiness of the restored signal is clearly reduced, while prosody related features are slightly improved.

References

  1. Arora, R. and Sethares, W. A. (2007). Adaptive wavetable oscillators. IEEE Trans. on Signal Processing, 55 (9):4382-4392.
  2. Bi, N. and Qi, Y. (1997). Application of speech conversion to alaryngeal speech enhancement. IEEE Transactions on Speech and Audio Processing, 5(2):97-105.
  3. Brockmann, M., Storck, C., Carding, P., and Drinnan, M. (2008). Voice loudness and gender effects on jitter and shimmer in healthy adults. Journal of Speech, Language and Hearing Research, 51:1152-1160.
  4. del Pozo, A. and Young, S. (2006). Continuous tracheoesophageal speech repair. EUSIPCO.
  5. Fant, G. (1981). The source filter concept in voice production. STL-QPSR, 22:21-37.
  6. Gerhard, D. (2003). Pitch extraction and fundamental frequency: History and current techniques. Technical report, University of Regina, CA.
  7. Haykin, S. (2001). Adaptive Filter Theory. Prentice Hall.
  8. Kasuya, H., Ogawa, S., Kikuchi, Y., and Ebihara, S. (1986). An acoustic analysis of pathological voice and its application to the evaluation of laryngeal pathology.
  9. Speech Communication, 5 (2):171-181.
  10. Mitev, P. and Hadjitodorov, S. (2003). Fundamental frequency estimation of voice of patients with laryngeal disorders. Information Sciences, 156 (1-2):3-19.
  11. Moerman, M., Pieters, G., Martens, J., van der Borgt, M., and Dejonckere, P. (2004). Objective evaluation of quality of substitution voices. Eur Arch Otorhinolaryngol, 261:541-547.
  12. Most, T., Tobin, Y., and Mimran, R. (2000). Acoustic and perceptual characteristics of esophageal and tracheoesophageal speech production. Journal of Communication Disorders, 33(2):165-180.
  13. Murakami, T. and Ishida, Y. (2001). Fundamental frequency estimation of speech signals using music algorithm. Acoust. Sci. Technol., 22 (4):293-297.
  14. Pindzola, R. and Cain, B. (1988). Acceptability ratings of tracheoesophageal speech. Laryngoscope, 98(4):394- 397.
  15. Qi, Y., Weinberg, B., and Bi, N. (1995). Enhancement of female esophageal and tracheoesophageal speech. J. Acoust. Soc. of America, 98(5 Pt 1):2461-2465.
  16. Rosenberg, A. and Hirschberg, J. (2006). On the correlation between energy and pitch accent in read english speech. Interspeech, 1294-Mon2A3O.2.
  17. Schleusing, O., Vetter, R., Renevey, P., Krauss, J., Reale, F., Schweizer, V., and Vesin, J.-M. (2009). Restoration of authentic features in tracheoesophageal speech by a multi-resolution approach. Proc. of SPPRA 2009, pages 643-042.
  18. The Mathworks (2006). Matlab 2006b.
  19. Turin, G. L. (1960). An introduction to matched filters. IRE Transactions on Information Theory, 6 (3):311-329.
  20. Un, C. and Yang, S. (1977). A pitch extraction algorithm based on lpc inverse filtering. IEEE Trans. ASSP, 25:378-389.
  21. van As, C. (2001). Tracheoesophageal Speech: A multidimensional assessment of voice quality. PhD thesis, University of Amsterdam.
  22. Verma, A. and Kumar, A. (2005). Introducing roughness in individuality transformation through jitter modelling and modification. ICASSP, 1:5-8.
  23. Vetter, R., Cornuz, J., Vuadens, P., Sola, I., and Renevey, P. (2006). Method and system for converting voice. European Patent. EP1710788.
  24. Weinberg, B. (1986). Laryngectomee Rehabilitation, chapter Acoustical properties of esophageal and tracheoesophageal speech, pages 113-127. College-Hill Press, San Diego, CA.
  25. Williams, S. and Barber Watson, J. (1987). Speaking proficiency variations according to method of alaryngeal voicing. Laryngoscope, 97(6):737-739.
Download


Paper Citation


in Harvard Style

Schleusing O., Vetter R., Renevey P., Vesin J. and Schweizer V. (2010). DEVICE FOR PROSODIC SPEECH RESTORATION - A Multi-Resolution Approach for Glottal Excitation Restoration . In Proceedings of the Third International Conference on Biomedical Electronics and Devices - Volume 1: BIODEVICES, (BIOSTEC 2010) ISBN 978-989-674-017-7, pages 38-43. DOI: 10.5220/0002747500380043


in Bibtex Style

@conference{biodevices10,
author={O. Schleusing and R. Vetter and Ph. Renevey and J.-M. Vesin and V. Schweizer},
title={DEVICE FOR PROSODIC SPEECH RESTORATION - A Multi-Resolution Approach for Glottal Excitation Restoration},
booktitle={Proceedings of the Third International Conference on Biomedical Electronics and Devices - Volume 1: BIODEVICES, (BIOSTEC 2010)},
year={2010},
pages={38-43},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002747500380043},
isbn={978-989-674-017-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Third International Conference on Biomedical Electronics and Devices - Volume 1: BIODEVICES, (BIOSTEC 2010)
TI - DEVICE FOR PROSODIC SPEECH RESTORATION - A Multi-Resolution Approach for Glottal Excitation Restoration
SN - 978-989-674-017-7
AU - Schleusing O.
AU - Vetter R.
AU - Renevey P.
AU - Vesin J.
AU - Schweizer V.
PY - 2010
SP - 38
EP - 43
DO - 10.5220/0002747500380043