Russian Sub-Word Based Speech Recognition Using Pocketsphinx Engine
Sergey Zablotskiy, Maxim Sidorov
2014
Abstract
Russian is a synthetic language with a large morpheme-per-word ratio and highly inflective nature. These two peculiarities increase the lexicon size for Russian automatic speech recognition (ASR) by tens of times in comparison to that for English covering the same out-of-vocabulary (OOV) rate. The employment of sub-word units is a widely spread state-of-the-art approach to reduce the abundant lexicon and lower the perplexity (PP) of the language model. The choice of sub-word units affects the accuracy of the entire speech recognition system, its performance as well as the complexity of the spoken phrase synthesis. Here, different recognition units are investigated using pocketsphinx-engine while recognizing the vocabulary of several million word forms. A designed text normalization approach is also briefly presented. This rule-based algorithm allows keeping diverse Russian abbreviations and numerals in the language model (LM) and avoiding the statistics distortion. The approach is directly applicable and useful for Russian text-to-speech translation as well.
References
- A. Stolcke, J. Zheng, W. W. and Abrash, V. (2011). SRILM at sixteen: Update and outlook. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Waikoloa, Hawaii.
- Arsoy, E., Can, D., Parlak, S., Sak, H., and Saras¸lar, M. (2009). Turkish broadcast news transcription and retrieval. IEEE Transactions on Audio, Speech and Language Processing, 17(5).
- Bisani, M. and Ney, H. (2005). Open vocabulary speech recognition with flat hybrid models. In Proc. of the European Conf. on Speech Communication and Technology (Eurospeech'05), pages 725-728, Lisbon (Portugal).
- Bogdanov, D., Bruhtiy, A., Krivnova, O., Podrabinovich, A., and Strokin, G. (2003). Organizational Control and Artificial Intelligence, chapter Technology of Speech Databases Development (in Russian), page 448. Editorial URSS.
- Byrne, W., Hajic?, J., Ircing, P., Krbec, P., and Psutka, J. (2000). Morpheme based language models for speech recognition of Czech. In Sojka, P., Kopecek, I., and Pala, K., editors, Text, Speech and Dialogue, volume 1902 of Lecture Notes in Computer Science, pages 139-162. Springer Berlin / Heidelberg.
- Carnegie Mellon University (2012). CMUSphinx. Open source toolkit for speech recognition.
- Chen, S. F. and Goodman, J. (1998). An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
- Creutz, M. and Lagus, K. (2005). Unsupervised morpheme segmentation and morphology induction from text corpora using morfessor 1.0. Technical Report A81, Helsinki University of Technology.
- El-Desoky, A., Gollan, C., Rybach, D., Schlter, R., and Ney, H. (2009). Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR. In Proc. of the 10th Annual Conference of the International Speech Communication Association (Interspeech'09), Brighton (UK).
- Karpov, A., Kipyatkova, I., and Ronzhin, A. (2011). Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In Proc. of the 12th Annual Conference of the International Speech Communication Association (Interspeech'11), Florence (Italy).
- Kipyatkova, I. and Karpov, A. (2009). Creation of multiple word transcriptions for conversational russian speech recognition. In Proc. of the 13th Conference “Speech and Computer” (SPECOM'2009), pages 71- 75, St.Peterburg (Russia).
- Kneser, R. and Ney, H. (1995). Improved backing-off for mgram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, volume 1, pages 181-184 vol.1.
- Moshkov, M. (2012). Maxim mashkov's library.
- Segalovich, I. (2003). A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In MLMTA, pages 273-280.
- Shaik, M., Mousa, A.-D., Schluter, R., and Ney, H. (2011). Using morpheme and syllable based sub-words for polish LVCSR. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 4680 -4683.
- Xu, B., Ma, B., Zhang, S., Qu, F., and Huang, T. (1996). Speaker-independent dictation of Chinese speech with 32K vocabulary. In Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, volume 4, pages 2320 -2323 vol.4.
- Zablotskiy, S., Shvets, A., Semenkin, E., and Minker, W. (2011a). Recognized Russian syllables concatenation by means of co-evolutionary asymptotic algorithm. In Proc. XIV International Conference “Speech and Computer” (SPECOM'2011).
- Zablotskiy, S., Zablotskaya, K., and Minker, W. (2011b). Automatic pre-processing of the Russian text corpora for language modeling. In Proc. XIV International Conference “Speech and Computer” (SPECOM'2011).
Paper Citation
in Harvard Style
Zablotskiy S. and Sidorov M. (2014). Russian Sub-Word Based Speech Recognition Using Pocketsphinx Engine . In Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ASAAHMI, (ICINCO 2014) ISBN 978-989-758-040-6, pages 840-844. DOI: 10.5220/0005148008400844
in Bibtex Style
@conference{asaahmi14,
author={Sergey Zablotskiy and Maxim Sidorov},
title={Russian Sub-Word Based Speech Recognition Using Pocketsphinx Engine},
booktitle={Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ASAAHMI, (ICINCO 2014)},
year={2014},
pages={840-844},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005148008400844},
isbn={978-989-758-040-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ASAAHMI, (ICINCO 2014)
TI - Russian Sub-Word Based Speech Recognition Using Pocketsphinx Engine
SN - 978-989-758-040-6
AU - Zablotskiy S.
AU - Sidorov M.
PY - 2014
SP - 840
EP - 844
DO - 10.5220/0005148008400844