FAST SPEAKER ADAPTATION IN AUTOMATIC ONLINE SUBTITLING
Aleš Pražák, Z. Zajíc, L. Machlica, J. V. Psutka
2009
Abstract
This paper deals with speaker adaptation techniques well suited for the task of online subtitling. Two methods are briefly discussed, namely MAP adaptation and fMLLR. The main emphasis is laid on the description of improvements involved in the process of adaptation subject to the time requirements. Since the adaptation data are gathered continuously, simple modifications of the accumulated statistics have to be carried out in order to make the adaptation more accurate. Another proposed improvement efficiently employs the combination of fMLLR and MAP. In the case of online adaptation no prior transcriptions of the data are available. They are handled by a recognition system, thus it is suitable to assign a well-applied confidence measure to each of the transcriptions. We have performed experiments focused on the trade-off between the adaptation speed and the amount of adaptation data. We were able to gain a relative reduction of WER 16.2 %.
References
- Evans, M. J. (2003). Speech recognition in assisted and live subtitling for television. Technical report, BBC.
- Gales, M. (1996). The generation and use of regression class trees for MLLR adaptation. Technical report, Cambridge University, Engineering Department.
- Gales, M. (1997). Maximum likelihood linear transformations for HMM-based speech recognition. Technical report, Cambridge University, Engineering Department.
- Gauvain, J.-L. and Lee, C.-H. (1994). Maximum aposteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions On Speech and Audio Processing, 2(2):291 - 298.
- Li, Y., Erdogan, H., Gao, Y., and Marcheret, E. (2002). Incremental on-line feature space MLLR adaptation for telephony speech recognition. In ICSLP 2002, International Conference on Spoken Language Processing.
- Machlica, L., Zajíc, Z., and Praz?ák, A. (2009). Methods of unsupervised adaptation in online speech recognition. In SPECOM 2009, International Conference on Speech and Computer.
- Povey, D. (2003). Discriminative Training for Large Vocabulary Speech Recognition. PhD thesis, Cambridge University, Engineering Department.
- Povey, D. and Saon, G. (2006). Feature and model space speaker adaptation with full covariance gaussians. In INTERSPEECH 2006.
- Praz?ák, A., Müller, L., Psutka, J. V., and Psutka, J. (2007). LIVE TV SUBTITLING - fast 2-pass LVCSR system for online subtitling. In SIGMAP 2007, International Conference on Signal Processing and Multimedia Applications.
- Wessel, F., Schlü ter, R., Macherey, K., and Ney, H. (2001). Confidence measures for large vocabulary continuous speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3).
- Zajíc, Z., Machlica, L., and Mü ller, L. (2009). Refinement approach for adaptation based on combination of MAP and fMLLR. In TSD 2009, International Conference on Text, Speech and Dialogue.
Paper Citation
in Harvard Style
Pražák A., Zajíc Z., Machlica L. and V. Psutka J. (2009). FAST SPEAKER ADAPTATION IN AUTOMATIC ONLINE SUBTITLING . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009) ISBN 978-989-674-007-8, pages 126-130. DOI: 10.5220/0002261701260130
in Bibtex Style
@conference{sigmap09,
author={Aleš Pražák and Z. Zajíc and L. Machlica and J. V. Psutka},
title={FAST SPEAKER ADAPTATION IN AUTOMATIC ONLINE SUBTITLING},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)},
year={2009},
pages={126-130},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002261701260130},
isbn={978-989-674-007-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2009)
TI - FAST SPEAKER ADAPTATION IN AUTOMATIC ONLINE SUBTITLING
SN - 978-989-674-007-8
AU - Pražák A.
AU - Zajíc Z.
AU - Machlica L.
AU - V. Psutka J.
PY - 2009
SP - 126
EP - 130
DO - 10.5220/0002261701260130