TOWARDS A SILENT SPEECH INTERFACE FOR PORTUGUESE - Surface Electromyography and the Nasality Challenge

João Freitas, António Teixeira, Miguel Sales Dias

2012

Abstract

A Silent Speech Interface (SSI) aims at performing Automatic Speech Recognition (ASR) in the absence of an intelligible acoustic signal. It can be used as a human-computer interaction modality in high-background-noise environments, such as living rooms, or in aiding speech-impaired individuals, increasing in prevalence with ageing. If this interaction modality is made available for users own native language, with adequate performance, and since it does not rely on acoustic information, it will be less susceptible to problems related to environmental noise, privacy, information disclosure and exclusion of speech impaired persons. To contribute to the existence of this promising modality for Portuguese, for which no SSI implementation is known, we are exploring and evaluating the potential of state-of-the-art approaches. One of the major challenges we face in SSI for European Portuguese is recognition of nasality, a core characteristic of this language Phonetics and Phonology. In this paper a silent speech recognition experiment based on Surface Electromyography is presented. Results confirmed recognition problems between minimal pairs of words that only differ on nasality of one of the phones, causing 50\% of the total error and evidencing accuracy performance degradation, which correlates well with the exiting knowledge.

References

  1. Betts, B. J. Binsted, K. Jorgensen, C., 2006. Smallvocabulary speech recognition using surface electromyography. Journal Interacting with Computers, vol. 18, pp. 1242-1259.
  2. Chan, A. D. C. Englehart, K. Hudgins, B. and Lovely, D.F., 2001. Hidden Markov model classification of myoelectric signals in speech. Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 1727-1730.
  3. Chan, A. D. C., 2003. Multi-expert automatic speech recognition system using myoelectric signals. Ph.D. Dissertation, Department of Electrical and Computer Engineering, University of New Brunswick, Canada.
  4. De Luca, C. J., 1979. Physiology and mathematics of myoelectric signals. IEEE Transactions on Biomedical Engineering, vol. BME-26, no. 6, pp. 313-325.
  5. Denby, B. Schultz, T. Honda, K., Hueber, T. Gilbert, J. M. and Brumberg, J.S., 2010. Silent speech interfaces. Speech Communication, v.52 n.4, April 2010, pp. 270- 287.
  6. Dias, M. S. Bastos, R. Fernandes, J. Tavares, J. and Santos, P., 2009. Using Hand Gesture and Speech in a Multimodal Augmented Reality Environment, GW2007, LNAI 5085, pp.175-180.
  7. Freitas, J. Teixeira, A. Dias M. S. and Bastos, C., 2011. Towards a Multimodal Silent Speech Interface for European Portuguese, Speech Technologies, Ivo Ipsic (Ed.), ISBN: 978-953-307-996-7, InTech.
  8. Gerdle, B. Karlsson, S. Day, S. Djupsjöbacka, M., 1999. Acquisition, processing and analysis of the surface electromyogram. in Modern Techniques in Neuroscience, U. Windhorst and H. Johansson, Eds. Berlin: Springer Verlag, pp. 705-755.
  9. Hardcastle, W. J., 1976. Physiology of Speech Production - An Introduction for Speech Scientists. Academic Press, London.
  10. Herff, C. Janke, M. Wand, M. Schultz, T., 2011. Impact of Different Feedback Mechanisms in EMG-based Speech Recognition. Interspeech 2011. Florence, Italy.
  11. Jorgensen, C. Lee, D. and Agabon, S., 2003. Sub auditory speech recognition based on EMG signals. In Proc. Internat. Joint Conf. on Neural Networks (IJCNN), pp. 3128-3133.
  12. Jorgensen, C. Binsted, K., 2005. Web browser control using EMG based sub vocal speech recognition. In: Proc. 38th Annual Hawaii Internat. Conf. on System Sciences. IEEE, pp. 294c.1-294c.8.
  13. Jorgensen, C. and Dusan, S., 2010. Speech interfaces based upon surface electromyography, Speech Communication, Volume 52, Issue 4, pp. 354-366.
  14. Jou, S. Schultz, T. Walliczek, M. Kraft, F. and Waibel, A., 2006. Towards Continuous Speech Recognition Using Surface Electromyography. International Conference of Spoken Language Processing, Interspeech 2006 - ICSLP, Pittsburgh, PA.
  15. Jou, S. Schultz, T. Waibel, A., 2007. Continuous Electromyographic Speech Recognition with a MultiStream Decoding Architecture. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, Honolulu, Hawaii, US.
  16. Junqua, J.-C. Fincke, S. and Field, K., 1999. The Lombard effect: a reflex to better communicate with others in noise. In Proc. IEEE Internat Conf. on Acoust. Speech Signal Process. (ICASSP). pp. 2083-2086.
  17. Maier-Hein, L. Metze, F. Schultz, T. and Waibel, A., 2005. Session independent non-audible speech recognition using surface electromyography, IEEE Workshop on Automatic Speech Recognition and Understanding, San Juan, Puerto Rico, pp. 331-336.
  18. Magen, H. S, 1997. The extent of vowel-to-vowel coarticulation. In English, J. Phonetics 25 (2), pp. 187-205.
  19. Manabe, H. Hiraiwa, A. Sugimura, T., 2003. Unvoiced speech recognition using EMG-mime speech recognition. In: Proc. CHI, Human Factors in Computing Systems, Ft. Lauderdale, Florida, pp. 794- 795.
  20. Manabe, H. Zhang, Z., 2004. Multi-stream HMMfor EMG-based speech recognition. In: Proc. 26th Annual International Conf. of the IEEE Engineering in Medicine and Biology Society, 1-5 September 2004, San Francisco, California, Vol. 2, pp. 4389-4392.
  21. Martins, P. Carbone, I. Pinto, A. Silva, A. and Teixeira, A., 2008. European Portuguese MRI based speech production studies. Speech Communication. NL: Elsevier, Vol.50, No.11/12, ISSN 0167-6393, December 2008, pp. 925-952.
  22. Pêra, V. Moura, A. and Freitas, D., 2004. LPFAV2: a new multi-modal database for developing speech recognition systems for an assistive technology application, In SPECOM-2004, pp. 73-76.
  23. Plux Wireless Biosignals, 2011. Portugal, [online] Available at: http://www.plux.info/ [Accessed 8 September 2011].
  24. Rossato, S. Teixeira, A. and Ferreira, L., 2006. Les Nasales du Portugais et du Français: une étude comparative sur les données EMMA. In XXVI Journées d'Études de la Parole. Dinard, France.
  25. Sá, F. Afonso, P. Ferreira, R. and Pera, V., 2003. Reconhecimento Automático de Fala Contínua em Português Europeu Recorrendo a Streams AudioVisuais. In The Proceedings of COOPMEDIA'2003 - Workshop de Sistemas de Informação Multimédia, Cooperativos e Distribuídos, Porto, Portugal.
  26. Schultz, T. and Wand. M., 2010. Modeling coarticulation in large vocabulary EMG-based speech recognition. Speech Communication, Vol. 52, Issue 4, April 2010, pp. 341-353.
  27. Seikel, J. A. King, D. W. and Drumright, D. G., 2010. Anatomy and Physiology for Speech, Language, and Hearing, 4rd Ed., Delmar Learning.
  28. Srinivasan, S. Raj, B. and Ezzat, T., 2010. Ultrasonic sensing for robust speech recognition. In Internat. Conf. on Acoustics, Speech, and Signal Processing 2010.
  29. Strevens, P., 1954. Some observations on the phonetics and pronunciation of modern Portuguese, Rev. Laboratório Fonética Experimental, Coimbra II, pp. 5- 29.
  30. Teixeira, J. S., 2000. Síntese Articulatória das Vogais Nasais do Português Europeu. PhD Thesis, Universidade de Aveiro.
  31. Teixeira, A. and Vaz, F., 2000. Síntese Articulatória dos Sons Nasais do Português. Anais do V Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR), ICMC-USP, Atibaia, São Paulo, Brasil, 2000, pp. 183-193.
  32. Teixeira, A. Moutinho, L. C. and Coimbra, R.L., 2003. Production, acoustic and perceptual studies on European Portuguese nasal vowels height. In Internat. Congress Phonetic Sciences (ICPhS), pp. 3033-3036.
  33. Toda, T. Nakamura, K. Nagai, T. Kaino, T. Nakajima, Y. and Shikano, K., 2009. Technologies for Processing Body-Conducted Speech Detected with Non-Audible Murmur Microphone. In Proceedings of Interspeech 2009, Brighton, UK.
  34. Tran, V.-A Bailly, G. Loevenbruck, H. and Toda, T., 2009. Multimodal HMM-based NAM to-speech conversion. In Proceedings of Interspeech 2009, Brighton, UK.
  35. Trigo, R. L., 1993. The inherent structure of nasal segments, In Nasals, Nasalization, and the Velum, Phonetics and Phonology, M. K. Huffman e R. A. Krakow (eds.), Vol. 5, pp.369-400, Academic Press Inc.
  36. Wand, M. and Schultz, T., 2009. Towards SpeakerAdaptive Speech Recognition Based on Surface Electromyography. In Proc. Biosignals, pp. 155-162, Porto, Portugal.
  37. Wand, M. Schultz, T., 2011a. Investigations on Speaking Mode Discrepancies in EMG-based Speech Recognition, Interspeech 2011, Florence, Italy.
  38. Wand, M. Schultz, T., 2011b. Analysis of Phone Confusion in EMG-based Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2011, Prague, Czech Republic.
  39. Wand, M. Schultz, T., 2011c. Session-Independent EMGbased Speech Recognition. International Conference on Bio-inspired Systems and Signal Processing 2011, Biosignals 2011, Rome, Italy.
  40. Wilpon, J. G. and Jacobsen, C. N., 1996. A Study of Speech Recognition for Children and the Elderly. IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta, p. 349.
Download


Paper Citation


in Harvard Style

Freitas J., Teixeira A. and Sales Dias M. (2012). TOWARDS A SILENT SPEECH INTERFACE FOR PORTUGUESE - Surface Electromyography and the Nasality Challenge . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 91-100. DOI: 10.5220/0003786100910100


in Bibtex Style

@conference{biosignals12,
author={João Freitas and António Teixeira and Miguel Sales Dias},
title={TOWARDS A SILENT SPEECH INTERFACE FOR PORTUGUESE - Surface Electromyography and the Nasality Challenge},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)},
year={2012},
pages={91-100},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003786100910100},
isbn={978-989-8425-89-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)
TI - TOWARDS A SILENT SPEECH INTERFACE FOR PORTUGUESE - Surface Electromyography and the Nasality Challenge
SN - 978-989-8425-89-8
AU - Freitas J.
AU - Teixeira A.
AU - Sales Dias M.
PY - 2012
SP - 91
EP - 100
DO - 10.5220/0003786100910100