TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS

Jan Vanek, Josef V. Psutka, Aleš Pražák, Josef Psutka

2009

Abstract

The paper deals with training of speaker-clustered acoustic models. Various training techniques - Maximum Likelihood, Discriminative Training and two adaptation based on the MAP and Discriminative MAP were tested in order to minimize an impact of speaker changes to the correct function of the recognizer when a response of the automatic cluster detector is delayed or incorrect. Such situation is very frequent e.g. in online subtitling of TV discussions (Parliament meetings). In our experiments the best cluster-dependent training procedure was discriminative adaptation which provided the best trade-off between recognition results with correct and non-correct cluster detector information.

References

  1. Povey, D. at al. (1999). Frame discrimination training for hmms for large vocabulary speechrecognition. In ICASSP.
  2. Povey, D. at al. (2001). Improved discriminative training techniques for large vocabulary continuous speech recognition. In ICASSP.
  3. Povey, D. at al. (2002). Minimum phone error and ismoothing for improved discriminative training. In ICASSP.
  4. Povey, D. at al. (2003). Mmi-map and mpe-map for acoustic model adaptation. In EUROSPEECH.
  5. Bahl, L.R. at al. (1986). Maximum mutual information estimation of hidden markov model parameters for speech recognition. In ICASSP.
  6. Gauvain, L. at al. (1994). Maximum a-posteriori estimation for multivariate gaussian mixture observations of markov chains. In IEEE Transactions SAP.
  7. Hermansky, H. (1990). Perceptual linear predictive (plp) analysis of speech. Acoustic. Soc., Am.87.
  8. Kapadia, S. (1998). Discriminative Training of Hidden Markov Models. PhD thesis, Cambridge University, Department of Engineering.
  9. McDermott, E. (2006). Discriminative training for large vocabulary speech recognition using minimum classification error. IEEE Trans. Speech and Audio Processing, Vol. 14. No. 2.
  10. Povey, D. (2003). Discriminative Training for Large Vocabulary Speech Recognition. PhD thesis, Cambridge University, Department of Engineering.
  11. Zelinka, J. (2009) Audio-Visual Speech Recognition. PhD thesis, West Bohemia University, Department of Cybernetics.
Download


Paper Citation


in Harvard Style

Vanek J., V. Psutka J., Pražák A. and Psutka J. (2009). TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009) ISBN 978-989-674-007-8, pages 131-135. DOI: 10.5220/0002262001310135


in Bibtex Style

@conference{sigmap09,
author={Jan Vanek and Josef V. Psutka and Aleš Pražák and Josef Psutka},
title={TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)},
year={2009},
pages={131-135},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002262001310135},
isbn={978-989-674-007-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)
TI - TRAINING OF SPEAKER-CLUSTERED ACOUSTIC MODELS FOR USE IN REAL-TIME RECOGNIZERS
SN - 978-989-674-007-8
AU - Vanek J.
AU - V. Psutka J.
AU - Pražák A.
AU - Psutka J.
PY - 2009
SP - 131
EP - 135
DO - 10.5220/0002262001310135