HMM-based Breath and Filled Pauses Elimination in ASR

Piotr Żelasko, Tomasz Jadczyk, Bartosz Ziółko

2014

Abstract

The phenomena of filled pauses and breaths pose a challenge to Automatic Speech Recognition (ASR) systems dealing with spontaneous speech, including recognizer modules in Interactive Voice Reponse (IVR) systems. We suggest a method based on Hidden Markov Models (HMM), which is easily integrated into HMM-based ASR systems and allows detection of those disturbances without incorporating additional parameters. Our method involves training the models of disturbances and their insertion in the phrase Markov chain between word-final and word-initial phoneme models. Application of the method in our ASR shows improvement of recognition results in Polish telephonic speech corpus LUNA.

References

  1. Audhkhasi, K., Kandhway, K., Deshmukh, O., and Verma, A. (2009). Formant-based technique for automatic filled-pause detection in spontaneous spoken english. In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pages 4857-4860.
  2. Barczewska, K. and Igras, M. (2012). Detection of disfluencies in speech signal. In Young scientists towards the challenges of modern technology: 7th international PhD students and young scientists conference in Warsaw.
  3. Bisani, M. and Ney, H. (2004). Bootstrap estimates for confidence intervals in asr performance evaluation. In Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP 7804). IEEE International Conference on, volume 1, pages I-409-12 vol.1.
  4. Boakye, K. and Stolcke, A. (2006). Improved speech activity detection using cross-channel features for recognition of multiparty meetings. In Proc. of INTERSPEECH, pages 1962-1965.
  5. Gollan, C., Bisani, M., Kanthak, S., Schluter, R., and Ney, H. (2005). Cross domain automatic transcription on the tc-star epps corpus. In Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP 7805). IEEE International Conference on, volume 1, pages 825-828.
  6. Goto, M., Itou, K., and Hayamizu, S. (1999). A real-time filled pause detection system for spontaneous speech recognition. In Proc. of Eurospeech, pages 227-230.
  7. Igras, M. and Ziólko, B. (2013a). Modelowanie i detekcja oddechu w sygnale akustycznym. In Proc. of Modelowanie i Pomiary w Medycynie.
  8. Igras, M. and Ziólko, B. (2013b). Wavelet method for breath detection in audio signals. In Multimedia and Expo (ICME), 2013 IEEE International Conference on, pages 1-6.
  9. Konturek, S. (2007). Fizjologia czlowieka. Podrecznik dla studentów medycyny. Elsevier Urban & Partner.
  10. Marciniak, M., editor (2010). Anotowany korpus dialogów telefonicznych. Akademicka Oficyna Wydawnicza EXIT, Warsaw.
  11. Ratan, V. (1993). Handbook of Human Physiology. Jaypee.
  12. Stouten, F. and Martens, J. (2003). A feature-based filled pause detection technique for dutch. In IEEE Intl Workshop on ASRU, pages 309-314.
  13. Ziólko, M., Galka, J., Ziólko, B., Jadczyk, T., Skurzok, D., and Ma¸sior, M. (2011). Automatic speech recognition system dedicated for Polish. Proceedings of Interspeech, Florence.
Download


Paper Citation


in Harvard Style

Żelasko P., Jadczyk T. and Ziółko B. (2014). HMM-based Breath and Filled Pauses Elimination in ASR . In Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014) ISBN 978-989-758-046-8, pages 255-260. DOI: 10.5220/0005023002550260


in Bibtex Style

@conference{sigmap14,
author={Piotr Żelasko and Tomasz Jadczyk and Bartosz Ziółko},
title={HMM-based Breath and Filled Pauses Elimination in ASR},
booktitle={Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014)},
year={2014},
pages={255-260},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005023002550260},
isbn={978-989-758-046-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2014)
TI - HMM-based Breath and Filled Pauses Elimination in ASR
SN - 978-989-758-046-8
AU - Żelasko P.
AU - Jadczyk T.
AU - Ziółko B.
PY - 2014
SP - 255
EP - 260
DO - 10.5220/0005023002550260