SPEAKER RECOGNITION USING DECISION FUSION

M. Chenafa, D. Istrate, V. Vrabie, M. Herbin

Abstract

Biometrics systems have gained in popularity for the automatic identification of persons. The use of the voice as a biometric characteristic offers advantages such as: is well accepted, it works with regular microphones, the hardware costs are reduced, etc. However, the performance of a voice-based biometric system easily degrades in the presence of a mismatch between training and testing conditions due to different factors. This paper presents a new speaker recognition system based on decision fusion. The fusion is based on two identification systems: a speaker identification system (text-independent) and a keywords identification system (speaker-independent). These systems calculate the likelihood ratios between the model of a test signal and the different models of the database. The fusion uses these results to identify the couple speaker/password corresponding to the test signal. A verification system is then applied on a second test signal in order to confirm or infirm the identification. The fusion step improves the false rejection rate (FRR) from 21, 43% to 7, 14% but increase also the false acceptation rate (FAR) from 21, 43% to 28, 57%. The verification step makes however a significant improvement on the FAR (from 28, 57% to 14.28%) while it keeps constant the FRR (to 7, 14%).

References

  1. Kinnunen, T., Hautamki, V., and Fr7'anti, P. (2004). Fusion of spectral feature sets for accurate speaker identification. In 9th International Conference Speech and Computer (SPECOM), pages 361-365.
  2. Lee, C. H., Soong, F., and Paliwal, K. (1996). Automatic Speech and Speaker Recognition. Springer, London, UK, 2nd edition edition.
  3. Mami, Y. (2003). Reconnaissance de locuteurs par localisation dans un espace de locuteur de reference. PhD thesis, ENST Paris, France.
  4. Reynolds, D. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17:91-108.
  5. Sigurdsson, S., Petersen, K. B., and Lehn-Schiøler, T. (2006). Mel frequency cepstral coefficients: An evaluation of robustness of mp3 encoded music. In Proceedings of the Seventh International Conference on Music Information Retrieval (ISMIR), pages 286-289.
Download


Paper Citation


in Harvard Style

Chenafa M., Istrate D., Vrabie V. and Herbin M. (2008). SPEAKER RECOGNITION USING DECISION FUSION . In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2008) ISBN 978-989-8111-18-0, pages 267-272. DOI: 10.5220/0001065502670272


in Bibtex Style

@conference{biosignals08,
author={M. Chenafa and D. Istrate and V. Vrabie and M. Herbin},
title={SPEAKER RECOGNITION USING DECISION FUSION},
booktitle={Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2008)},
year={2008},
pages={267-272},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001065502670272},
isbn={978-989-8111-18-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2008)
TI - SPEAKER RECOGNITION USING DECISION FUSION
SN - 978-989-8111-18-0
AU - Chenafa M.
AU - Istrate D.
AU - Vrabie V.
AU - Herbin M.
PY - 2008
SP - 267
EP - 272
DO - 10.5220/0001065502670272