IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM

Rong Fu, Ian D. Benest

2007

Abstract

This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone. It is based on the ICSI-SRI Fall 2004 diarization system (Wooters et al., 2004), but it has a number of significant modifications. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. It determines the model complexity automatically. It adapts the segment model from a Universal Background Model (UBM), and uses the cross-likelihood ratio (CLR) instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Altogether this reduces the speaker diarization error rate from 25.36% to 21.37% compared to the baseline system (Wooters et al., 2004).

References

  1. Ajmera, J. and Lapidot, I. (2002). Improved unknownmultiple speaker clustering using hmm. In IDIAP RR. pp.02-23.
  2. Barras, C. and Gauvain, J. L. (2003). Feature and score normalization for speaker verification of cellular data. In ICASSP Proc.
  3. Barras, C., Zhu, X., Meignier, S., and Gauvain, J. L. (2004). Improving speaker diarization. In Fall 2004 Rich transcription Workshop (RT-04) Proc.
  4. Barras, C., Zhu, X., Meignier, S., and Gauvain, J. L. (2006). Multistage speaker diarization of broadcast news. In IEEE Trans. SL Proc. pp.1505-1512.
  5. ICSI (2004). Icsi meeting speech. In International Computer Science Institute. http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2004S02.
  6. Jin, Q., Laskowski, K., Schultz, T., and Waibel, A. (2004). Speaker segmentation and clustering in meetings. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Meeting Recognition Workshop Proc.
  7. McLachlan, G. and Krishnan, T. (1997). The EM algorithm and extensions. John Wiley & Sons, New York, 1st edition.
  8. NIST (2004). Fall 2004 rich transcription (rt04) evaluation plan, 2004. In National Institute of Standards and Technology. Available:http://www.nist.gov/speech/tests/rt/rt04 /fall/docs/rt04f-eval-plan-v14.pdf.
  9. Schwarz, G. (1978). Estimating the dimension of a model. In Annals of Statistics Proc. Vol.6, pp.461-464.
  10. Sinha, R., Tranter, S. E., Gales, M. J., and Woodland, P. C. (2005). The cambridge university march 2005 speaker diarization system. In Eur. Conf. Speech Communication Technology, Proc. pp.2437-2440.
  11. Tranter, S. E. and Reynolds, D. A. (2006). An overview of automatic speaker diarization systems. In IEEE Trans. Speech and Language (SL) Proc. Vol.14, pp.1557- 1565.
  12. Ueda, N., Nakano, R., Gharhamani, Z., and Hinton, G. (2000). Smem algorithm for mixture models. In Neural Computation Proc. Vol.12, pp.2109-2128.
  13. Wallace, C. and Dowe, D. (1987). Estimation and inference via compact coding. In J. Royal Statistical Soc. (B). Vol.49, pp.241-252.
  14. Wooters, C., Fung, J., Peskin, B., and Anguera, X. (2004). Toward robust speaker segmentation: Icsi-sri fall 2004 diarization system. In Fall 2004 Rich transcription Workshop (RT-04) Proc.
  15. Zhou, B. and Hansen, J. (2000). Improving speaker diarization. In Int. Conf. Spoken Langrage Process Proc. Vol.3, pp.714-717.
Download


Paper Citation


in Harvard Style

Fu R. and D. Benest I. (2007). IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM . In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007) ISBN 978-989-8111-13-5, pages 313-319. DOI: 10.5220/0002140703130319


in Bibtex Style

@conference{sigmap07,
author={Rong Fu and Ian D. Benest},
title={IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM},
booktitle={Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)},
year={2007},
pages={313-319},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002140703130319},
isbn={978-989-8111-13-5},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Second International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2007)
TI - IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM
SN - 978-989-8111-13-5
AU - Fu R.
AU - D. Benest I.
PY - 2007
SP - 313
EP - 319
DO - 10.5220/0002140703130319