ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS

Nora Barroso, Karmele López de Ipiña, Aitzol Ezeiza

Abstract

Automatic Speech Recognition (ASR) is one of the classical multivariate statistical modelling applications that involves dealing with issues such as Acoustic Modelling (AM) or Language Modelling (LM). These tasks are generally very language-dependent and require very large resources. This work is focused on the selection of appropriate acoustic models for Speech Processing in a complex environment (a multilingual context in under-resourced and noisy conditions) oriented to general ASR tasks. The work has been carried out with a small trilingual speech database with very low audio quality. Thus, in order to decrease the negative impact that the lack of resources has in this task there have been selected two techniques: In the one hand, Hidden Markov Models have been enhanced using hybrid topologies and parameters as acoustic models of the sublexical units. In the other hand, an optimum configuration has been developed for the Acoustic Phonetic Decoding system, based on multivariate Gaussian numbers and the insertion penalty.

References

  1. Baker, J., 1975, Stochastic Modeling for Automatic Speech Recognition, Speech Recognition, Reddy, Academic Press.
  2. Barroso, N. Ezeiza A., Gilisagasti, N., L Ipiña K., López A. and López J. M.,2007, Development of Multimodal Resources for Multilingual Information Retrieval in the Basque context., INTERSPEECH Antwerp, Belgium, 2007.
  3. Barroso, N., Lopez De Ipiña K., Hernandez C. and Ezeiza A., 2011a. Matrix covariance estimation methods for robust security speech recognition with underresourced conditions, 45th IEEE International Carnahan Conference on Security Technology, Mataro Barcelona
  4. Barroso, N., López de Ipiña, K., Ezeiza, A., Hernández, C., Ezeiza, N., Barroso, O., Susperregi, U. and Barroso, S., 2011. GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages, INTERSPEECH, Florence Italy.
  5. Baum, E., Petrie, T., Soules, G., & Weiss, N., 1970, A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Statistics, vol. 41, no. 1, pp. 164-171.
  6. Baum, L. E., and Eagon, J. A., 1967, An Inequality with Applications to Statistical Estimation for Probabilistic Functions of a Markov Process and to a Model for Ecology., In Bulletin of the American Mathematical Society, vol. 73, pp. 360-370.
  7. Cosi P. “Hybrid HMM-NN architectures for connected digit recognition”. Proc. of the IJC on Neural Networks, vol. 5, 2000
  8. Ellis, D., 2011, http://labrosa.ee.columbia.edu/
  9. Friedman J. H., 1989, Regularized discriminant analysis. Journal of the American Statistical Association, vol. 84, pp. 165-175, 1989.
  10. Jelinek., 1976, Continuous Speech Recognition by Statistical Methods. Proceedings of the IEEE, vol. 64, no. 4, pp. 532-556.
  11. Le V. B. and Besacier L., 2009 Automatic speech recognition for under-resourced languages: application to Vietnamese language. IEEE Transactions on Audio, Speech, and Language Processing, Volume 17, Issue 8, pp 1471-1482, 2
  12. Martinez A. and Kak A., 2001, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, 228-233
  13. Puertas, I., 2000, Robustez de Reconocimiento fonético de voz para aplicaciones telefónicas. Madrid: Tesis doctoral.
  14. Rabiner, H. R., & Juang, B. H., 1993, Fundamentals of Speech Recognition, USA: Prentice Hall
  15. Schultz, T. and Waibel, A., 1998, Multilingual and Crosslingual Speech Recognition, Proceedings of the DARPA BC. Workshop.
  16. Seng S., Sam S., Le V. B., Bigi B. and Besacier L., 2008, Which Units For Acoustic and Language Modeling For Khmer Automatic Speech Recognition., 1st International Conference on Spoken Language Processing for Under-resourced languages Hanoi, Vietnam
  17. Smith N., Gales M. “Speech recognition using SVMs”, Advances in Neural Information Processing Systems 14. MIT Press, 2002.
Download


Paper Citation


in Harvard Style

Barroso N., López de Ipiña K. and Ezeiza A. (2012). ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 507-516. DOI: 10.5220/0003894105070516


in Bibtex Style

@conference{mpbs12,
author={Nora Barroso and Karmele López de Ipiña and Aitzol Ezeiza},
title={ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)},
year={2012},
pages={507-516},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003894105070516},
isbn={978-989-8425-89-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: MPBS, (BIOSTEC 2012)
TI - ACOUSTIC MODELLING FOR SPEECH PROCESSING IN COMPLEX ENVIRONMENTS
SN - 978-989-8425-89-8
AU - Barroso N.
AU - López de Ipiña K.
AU - Ezeiza A.
PY - 2012
SP - 507
EP - 516
DO - 10.5220/0003894105070516