IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM
Rong Fu, Ian D. Benest
2007
Abstract
This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone. It is based on the ICSI-SRI Fall 2004 diarization system (Wooters et al., 2004), but it has a number of significant modifications. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. It determines the model complexity automatically. It adapts the segment model from a Universal Background Model (UBM), and uses the cross-likelihood ratio (CLR) instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Altogether this reduces the speaker diarization error rate from 25.36% to 21.37% compared to the baseline system (Wooters et al., 2004).
References
- Ajmera, J. and Lapidot, I. (2002). Improved unknownmultiple speaker clustering using hmm. In IDIAP RR. pp.02-23.
- Barras, C. and Gauvain, J. L. (2003). Feature and score normalization for speaker verification of cellular data. In ICASSP Proc.
- Barras, C., Zhu, X., Meignier, S., and Gauvain, J. L. (2004). Improving speaker diarization. In Fall 2004 Rich transcription Workshop (RT-04) Proc.
- Barras, C., Zhu, X., Meignier, S., and Gauvain, J. L. (2006). Multistage speaker diarization of broadcast news. In IEEE Trans. SL Proc. pp.1505-1512.
- ICSI (2004). Icsi meeting speech. In International Computer Science Institute. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2004S02.
- Jin, Q., Laskowski, K., Schultz, T., and Waibel, A. (2004). Speaker segmentation and clustering in meetings. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Meeting Recognition Workshop Proc.
- McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. John Wiley & Sons, New York, 1st edition.
- NIST (2004). Fall 2004 rich transcription (rt04) evaluation plan, 2004. In National Institute of Standards and Technology. Available:http://www.nist.gov/speech/tests/rt/rt04 /fall/docs/rt04f-eval-plan-v14.pdf.
- Schwarz, G. (1978). Estimating the dimension of a model. In Annals of Statistics Proc. Vol.6, pp.461-464.
- Sinha, R., Tranter, S. E., Gales, M. J., and Woodland, P. C. (2005). The cambridge university march 2005 speaker diarization system. In Eur. Conf. Speech Communication Technology, Proc. pp.2437-2440.
- Tranter, S. E. and Reynolds, D. A. (2006). An overview of automatic speaker diarization systems. In IEEE Trans. Speech and Language (SL) Proc. Vol.14, pp.1557- 1565.
- Ueda, N., Nakano, R., Gharhamani, Z., and Hinton, G. (2000). Smem algorithm for mixture models. In Neural Computation Proc. Vol.12, pp.2109-2128.
- Wallace, C. and Dowe, D. (1987). Estimation and inference via compact coding. In J. Royal Statistical Soc. (B). Vol.49, pp.241-252.
- Wooters, C., Fung, J., Peskin, B., and Anguera, X. (2004). Toward robust speaker segmentation: Icsi-sri fall 2004 diarization system. In Fall 2004 Rich transcription Workshop (RT-04) Proc.
- Zhou, B. and Hansen, J. (2000). Improving speaker diarization. In Int. Conf. Spoken Langrage Process Proc. Vol.3, pp.714-717.
Paper Citation
in Harvard Style
Fu R. and D. Benest I. (2007). IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 313-319. DOI: 10.5220/0002140703130319
in Bibtex Style
@conference{sigmap07,
author={Rong Fu and Ian D. Benest},
title={IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={313-319},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002140703130319},
isbn={978-989-8111-13-5},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM
SN - 978-989-8111-13-5
AU - Fu R.
AU - D. Benest I.
PY - 2007
SP - 313
EP - 319
DO - 10.5220/0002140703130319