Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech

Petr Stanislav; Josef V. Psutka

doi:10.5220/0004129502040207

Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech

Petr Stanislav, Josef V. Psutka

2012

Abstract

This paper presents the initial steps towards building speech recognition system that is able to efficiently process electrolaryngeal substitute speech produced by laryngectomees. Speakers after total laryngectomy are characterized by restricted aero-acoustic properties in comparison with normal speakers and their speech is therefore far less intelligible. We suggested and tested several approaches to acoustic modeling within the ASR system that would be able to cope with this lower intelligibility. Comparative experiments were also performed on the healthy speakers. We tried several mappings that unify unvoiced phonemes with their voiced counterparts in the acoustic modeling process both on monophone and triphone level. Systems using zerogram and trigram language models were evaluated and compared in order to increase the credibility of the results.

References

Nakamura, K. (2010). Doctoral Thesis: Speaking Aid System Using Statistical Voice Conversion for Electrolaryngeal Speech. PhD thesis, Japan.
Praz?ák, A., Ircing, P., S?vec, J., and Psutka, J. V. (2008). Efficient combination of n-gram language models and recognition grammars in real-time lvcsr decoder. In 9th International Conference on Signal Processing Proceedings, pages 587-591, Peking, China. IEEE.
Psutka, J. V. and et al. (2007). Searching for a robust mfccbased parameterization for asr application. SIGMAP 2007: Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 196-199.
Radová, V. and Psutka, J. (2000). UWB-S01 corpus: A czech read-speech corpus. Proceedings of the 6th International Conference on Spoken Language Processing.
Stolcke, A. (2002). SRILM - an extensible language modeling toolkit. International Conference on Spoken Language Processing.

Download

Paper Citation

in Harvard Style

Stanislav P. and V. Psutka J. (2012). Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech . In Proceedings of the International Conference on Signal Processing and Multimedia Applications and Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2012) ISBN 978-989-8565-25-9, pages 204-207. DOI: 10.5220/0004129502040207

in Bibtex Style

@conference{sigmap12,
author={Petr Stanislav and Josef V. Psutka},
title={Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications and Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2012)},
year={2012},
pages={204-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004129502040207},
isbn={978-989-8565-25-9},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications and Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2012)
TI - Influence of Different Phoneme Mappings on the Recognition Accuracy of Electrolaryngeal Speech
SN - 978-989-8565-25-9
AU - Stanislav P.
AU - V. Psutka J.
PY - 2012
SP - 204
EP - 207
DO - 10.5220/0004129502040207