Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion

Nassim Asbai, Messaoud Bengherabi, Farid Harizi, Abderrahmane Amrouche

2013

Abstract

This paper provides an overview of low-level features for speaker recognition, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on); which has proven high noise robustness in the context of speaker verification. Using the TIMIT corpus the performance of the MFCC-asymmetric is compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) and The Linear Frequency Cepstral Coefficients (LFCC) under clean and noisy environments. To simulate real world conditions, the verification phase was tested with two noises (babble and factory) at different Signal-to-Noise Ratios (SNR) issued from NOISEX-92 database. The experimental results showed that MFCCs-asymmetric tapers (k=4) outperform other features in noisy condition. Finally, we have investigated the impact of consolidating evidences from different features by score level fusion. Preliminary results show promising improvement on verification rate with score fusion.

References

  1. Alam, J., Kenny, P., and O Shaughnessy, D., EUSIPCO, 2012. Robust Speech Recognition under Noisy Environments using Asymmetric Tapers Proc.
  2. Alam. J., Kenny, P., and O Shaughnessy, D., June 2012. On the Use of Asymmetric-shaped Tapers for Speaker Verification using I-Vectors Proc. Odyssey Speaker and Language Recognition Workshop, Singapore.
  3. Ambikairajah, E., 2007. Emerging features for speaker recognition. In: Proc. Sixth Internat. IEEE Conf. on Information, Communications & Signal Processing, Singapore, pp. 1-7.
  4. Chaudhari, U., Navratil, J., Maes, S., 2003. Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition. IEEE Trans. Speech Audio Process. 11 (1), 61-69.
  5. Fry, D. B., 1959. Theoritical Aspects of Mechanical speech Recognition. Universtiy College London, J.British Inst. Radio Engr., 19:4,211-299.
  6. Harris, F. (1978). On the use of windows for harmonic analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1), 51-84.
  7. Juan, A., Morales-Cordovilla, Victoria Sánchez, Antonio Peinado, M., and Ángel Gómez, M., September, 2011.On the use of asymmetric windows for robust speech recognition. Circuits, Systems and Signal Processing (Springer).
  8. Kinnunen T., Li, H., August 2009. An overview of text independent speaker recognition: From features to supervectors. Speech Communication 52, 12-40, ScienceDirect.
  9. Moore, B., (1995). Hearing. Academic Press, Inc., ISBN 0-12-505626-5.
  10. Rabiner,L., Juang, B. H., 1993. Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall.
  11. Reynolds, D. A., Quatieri, T. F. and Dunn, R. B. Jan. 2000. Speaker verification using adapted Gaussian mixture models. Dig Sig. Proc., vol. 10, no. 1, pp. 19- 41.
  12. Rozman, R., Kodek, D. M., Jan 2007.Using asymmetric windows in automatic speech recognition. Speech Comm., vol. 49, pp. 268-276.
  13. Sambur, M. R., 1972. Speaker recognition and verification using linear prediction analysis. Ph. D. Dissert, M.I.T.
  14. Teeni, D., Carey, J. and Zhang, P., Hoboken (2007). Human Computer Interaction: Developing Effective Organizational Information Systems, John Wiley & Sons.
  15. Xing Fan and John H. L. Hansen, ICASSP 2009.Speaker Identification with Whispered Speech based on modified LFCC Parameters and Feature Mapping. Taipei, Taiwan.
Download


Paper Citation


in Harvard Style

Asbai N., Bengherabi M., Harizi F. and Amrouche A. (2013). Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion . In Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013) ISBN 978-989-8565-74-7, pages 33-38. DOI: 10.5220/0004525500330038


in Bibtex Style

@conference{sigmap13,
author={Nassim Asbai and Messaoud Bengherabi and Farid Harizi and Abderrahmane Amrouche},
title={Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion},
booktitle={Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013)},
year={2013},
pages={33-38},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004525500330038},
isbn={978-989-8565-74-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 10th International Conference on Signal Processing and Multimedia Applications and 10th International Conference on Wireless Information Networks and Systems - Volume 1: SIGMAP, (ICETE 2013)
TI - Improving the Performance of Speaker Verification Systems under Noisy Conditions using Low Level Features and Score Level Fusion
SN - 978-989-8565-74-7
AU - Asbai N.
AU - Bengherabi M.
AU - Harizi F.
AU - Amrouche A.
PY - 2013
SP - 33
EP - 38
DO - 10.5220/0004525500330038