Silent Speech for Human-Computer Interaction

João Freitas, António Teixeira, Miguel Sales Dias


A Silent Speech Interface (SSI) performs Automatic Speech Recognition (ASR) in the absence of an intelligible acoustic signal and can be used as a human-computer interface modality in high-background-noise environments such as living rooms, or in aiding speech-impaired individuals such as elderly persons. By acquiring data from elements of the human speech production process – from glottal and articulators activity, their neural pathways or the central nervous system – an SSI produces an alternative digital representation of speech, which can be recognized and interpreted as data, synthesized directly or routed into a communications network. Nowadays, conventional ASR systems rely only on acoustic information, making them susceptible to problems like environmental noise, privacy, information disclosure and also excluding users with speech impairments. To tackle this problem in the context of ASR for Human-Computer Interaction, we propose a novel SSI based on multiple modalities in European Portuguese (EP), a language for which no SSI has yet been developed. After a state-of-the-art assessment, we have selected less-invasive modalities - Vision, Surface Electromyography and Ultrasound – in order to obtain a more complete representation of the human speech production model. Our aim is now to develop a multimodal SSI prototype adapted to EP and evaluate its usability in real-world scenarios.


