TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY

Michael Wand, Tanja Schultz

2009

Abstract

We present our recent advances in silent speech interfaces using electromyographic signals that capture the movements of the human articulatory muscles at the skin surface for recognizing continuously spoken speech. Previous systems were limited to speaker- and session-dependent recognition tasks on small amounts of training and test data. In this paper we present speaker-independent and speaker-adaptive training methods which for the first time allows us to use a large corpus of data from many speakers to reliably train acoustic models. On this corpus we compare the performance of speaker-dependent and speaker-independent acoustic models, carry out model adaptation experiments, and investigate the impact of the amount of training data on the overall system performance. In particular, since our data corpus is relatively large compared to previous studies, we are able for the first time to train an EMG recognizer with context-dependent acoustic models. We show that like in acoustic speech recognition, context-dependent modeling significantly increases the recognition performance.

References

  1. Chan, A., Englehart, K., Hudgins, B., and Lovely, D. (2002). Hidden Markov Model Classification of Myolectric Signals in Speech. Engineering in Medicine and Biology Magazine, IEEE, 21(9):143-146.
  2. Dietrich, M. (2008). The Effects of Stress Reactivity on Extralaryngeal Muscle Tension in Vocally Normal Participants as a Function of Personality. PhD thesis, University of Pittsburgh.
  3. Dietrich, M. and Abbott, K. V. (2007). Psychobiological framework of Stress and Voice: A Psychobiological Framework for Studying Psychological Stress and its Relation to Voice Disorders. In: K. Izdebski (Ed.): Emotions in the Human Voice (Vol.II, Clinical Evidence, pp. 159-178). San Diego, Plural Publishing.
  4. Finke, M. and Rogina, I. (1997). Wide Context Acoustic Modeling in Read vs. Spontaneous Speech. In Proc. ICASSP, volume 3, pages 1743-1746.
  5. Hueber, T., Chollet, G., Denby, B., Dreyfus, G., and Stone, M. (2007). Continuous-Speech Phone Recognition from Ultrasound and Optical Images of the Tongue and Lips. In Proc. Interspeech, pages 658-661.
  6. Jorgensen, C. and Binsted, K. (2005). Web Browser Control Using EMG Based Sub Vocal Speech Recognition. In Proceedings of the 38th Hawaii International Conference on System Sciences.
  7. Jou, S.-C., Maier-Hein, L., Schultz, T., and Waibel, A. (2006a). Articulatory Feature Classification Using Surface Electromyography. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-2006), Toulouse, France, May 15-19, 2006.
  8. Jou, S.-C., Schultz, T., and Waibel, A. (2005). Whispery Speech Recognition Using Adapted Articulatory Features. In Proc. ICASSP.
  9. Jou, S.-C., Schultz, T., Walliczek, M., Kraft, F., and Waibel, A. (2006b). Towards Continuous Speech Recognition using Surface Electromyography. In Proc. Interspeech, Pittsburgh, PA.
  10. Kingsbury, N. G. (2000). A Dual-Tree Complex Wavelet Transform with Improved Orthogonality and Symmetry Properties. In Proc. IEEE Conf. on Image Processing, Vancouver.
  11. Leggetter, C. J. and Woodland, P. C. (1995). Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language, 9:171-185.
  12. Maier-Hein, L., Metze, F., Schultz, T., and Waibel, A. (2005). Session Independent Non-Audible Speech Recognition Using Surface Electromyography. In Proc. ASRU.
  13. Shensa, M. J. (1992). The Discrete Wavelet Transform: Wedding the Ì Trous and Mallat Algorithms. IEEE Transactions on Signal Processing, 40:2464-2482.
  14. Walliczek, M., Kraft, F., Jou, S.-C., Schultz, T., and Waibel, A. (2006). Sub-Word Unit Based Non-Audible Speech Recognition Using Surface Electromyography. In Proc. Interspeech, Pittsburgh, PA.
  15. Wand, M., Jou, S.-C. S., and Schultz, T. (2007). Waveletbased Front-End for Electromyographic Speech Recognition. In Proc. Interspeech.
  16. Yu, H. and Waibel, A. (2000). Streamlining the Front End of a Speech Recognizer. In Proc. ICSLP.
Download


Paper Citation


in Harvard Style

Wand M. and Schultz T. (2009). TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 155-162. DOI: 10.5220/0001549601550162


in Bibtex Style

@conference{biosignals09,
author={Michael Wand and Tanja Schultz},
title={TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={155-162},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001549601550162},
isbn={978-989-8111-65-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY
SN - 978-989-8111-65-4
AU - Wand M.
AU - Schultz T.
PY - 2009
SP - 155
EP - 162
DO - 10.5220/0001549601550162