DECISION-TREE BASED ANALYSIS OF SPEAKING MODE DISCREPANCIES IN EMG-BASED SPEECH RECOGNITION

Michael Wand, Matthias Janke, Tanja Schultz

2012

Abstract

This study is concerned with the impact of speaking mode variabilities on speech recognition by surface electromyography (EMG). In EMG-based speech recognition, we capture the electric potentials of the human articulatory muscles by surface electrodes, so that the resulting signal can be used for speech processing. This enables the user to communicate silently, without uttering any sound. Previous studies have shown that the processing of silent speech creates a new challenge, namely that EMG signals of audible and silent speech are quite distinct. In this study we consider EMG signals of three speaking modes: audibly spoken speech, whispered speech, and silently mouthed speech. We present an approach to quantify the differences between these speaking modes by means of phonetic decision trees and show that this measure correlates highly with differences in the performance of a recognizer on the different speaking modes. We furthermore reinvestigate the spectral mapping algorithm, which reduces the discrepancy between different speaking modes, and give an evaluation of its effectiveness.

References

  1. Bahl, L. R., de Souza, P. V., Gopalakrishnan, P. S., Nahmoo, D., and Picheny, M. A. (1991). Decision Trees for Phonological Rules in Continuous Speech. In Proc. of the IEEE International Conference of Acoustics, Speech, and Signal Processing (ICASSP), pages 185 - 188, Toronto, Ontario, Canada.
  2. Denby, B., Schultz, T., Honda, K., Hueber, T., and Gilbert, J. (2010). Silent Speech Interfaces. Speech Communication, 52(4):270 - 287.
  3. Finke, M. and Rogina, I. (1997). Wide Context Acoustic Modeling in Read vs. Spontaneous Speech. In Proc. ICASSP, volume 3, pages 1743-1746.
  4. Janke, M., Wand, M., and Schultz, T. (2010a). A Spectral Mapping Method for EMG-based Recognition of Silent Speech. In Proc. B-INTERFACE.
  5. Janke, M., Wand, M., and Schultz, T. (2010b). Impact of Lack of Acoustic Feedback in EMG-based Silent Speech Recognition. In Proc. Interspeech.
  6. Jou, S.-C., Schultz, T., Walliczek, M., Kraft, F., and Waibel, A. (2006). Towards Continuous Speech Recognition using Surface Electromyography. In Proc. Interspeech, pages 573 - 576, Pittsburgh, PA.
  7. Kirchhoff, K. (1999). Robust Speech Recognition Using Articulatory Information. PhD thesis, University of Bielefeld.
  8. Metze, F. and Waibel, A. (2002). A Flexible Stream Architecture for ASR Using Articulatory Features. In Proc. of the International Conference on Spoken Language Processing (ICSLP), pages 2133 - 2136, Denver, Colorado, USA.
  9. Schultz, T. and Waibel, A. (2001). Language Independent and Language Adaptive Acoustic Modeling for Speech Recognition. Speech Communication, 35:31 - 51.
  10. Schultz, T. and Wand, M. (2010). Modeling Coarticulation in Large Vocabulary EMG-based Speech Recognition. Speech Communication, 52:341 - 353.
  11. Schünke, M., Schulte, E., and Schumacher, U. (2006). Prometheus - Lernatlas der Anatomie, volume [3]: Kopf und Neuroanatomie. Thieme Verlag, Stuttgart, New York.
  12. Wand, M., Janke, M., and Schultz, T. (2011). Investigations on Speaking Mode Discrepancies in EMG-based Speech Recognition. In Proc. Interspeech.
  13. Wand, M., Jou, S.-C. S., Toth, A. R., and Schultz, T. (2009). Impact of Different Speaking Modes on EMG-based Speech Recognition. In Proc. Interspeech.
  14. Wand, M. and Schultz, T. (2011). Session-independent EMG-based Speech Recognition. In Proc. Biosignals.
  15. Welch, P. (1967). The use of fast fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. Audio and Electroacoustics, IEEE Transactions on, 15(2):70-73.
Download


Paper Citation


in Harvard Style

Wand M., Janke M. and Schultz T. (2012). DECISION-TREE BASED ANALYSIS OF SPEAKING MODE DISCREPANCIES IN EMG-BASED SPEECH RECOGNITION . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 101-109. DOI: 10.5220/0003787201010109


in Bibtex Style

@conference{biosignals12,
author={Michael Wand and Matthias Janke and Tanja Schultz},
title={DECISION-TREE BASED ANALYSIS OF SPEAKING MODE DISCREPANCIES IN EMG-BASED SPEECH RECOGNITION},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)},
year={2012},
pages={101-109},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003787201010109},
isbn={978-989-8425-89-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)
TI - DECISION-TREE BASED ANALYSIS OF SPEAKING MODE DISCREPANCIES IN EMG-BASED SPEECH RECOGNITION
SN - 978-989-8425-89-8
AU - Wand M.
AU - Janke M.
AU - Schultz T.
PY - 2012
SP - 101
EP - 109
DO - 10.5220/0003787201010109