NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING

L. Zão, R. Coelho

Abstract

This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and pH features, combined with the multicondition training (MT) technique based on artificial colored spectra noises, for noise robust speaker verification. The a-integrated Gaussian mixture models (a-GMM), an extension of the conventional GMM, are used in the speaker verification experiments. Five real acoustic noises are used to corrupt the speech signals in different signal-to-noise ratios (SNR) for tests. The experiments results show that the use of MFCC + pH feature vectors improves the accuracy of speaker verification systems based on single MFCC. It is also shown that the speaker verification system with the MFCC + pH fusion and the a-GMM with the MT technique achieves the best performance for the speaker verification task in noisy environments.

References

  1. Al-Alaoui, M. (1993). Novel digital integrator and differentiator. Electronics Letters, 29(4):376-378.
  2. Bimbot, F., Bonastre, J. F., Fredouille, C., Gravier, G., Chagnolleau, M. I., Meignier, S., Merlin, T., Garcia, O. J., Delacretaz, P., and Reynolds (2004). A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Applied Signal Processing, 4:430-451.
  3. Boll, S. (1979). Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27:113- 120.
  4. Campbell, J., Shen, W., Campbell, W., Schwartz, R., Bonastre, J.-F., and Matrouf, D. (2009). Forensic Speaker Recognition. IEEE Signal Processing Magazine, 26:95-103.
  5. Cooke, M., Green, P., Josifovski, L., and Vizinho, A. (2001). Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data. Speech Communication, 34:267-285.
  6. Daubechies, I. (1992). Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia, USA.
  7. Davis, S. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4):357-366.
  8. Fisher, W. M., Doddington, G. R., and Goudie-Marshall, K. M. (1986). The DARPA Speech Recognition Research Database: Specifications and Status. Proceedings of DARPA Workshop on Speech Recognition, pages 93-99.
  9. Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition (2nd ed.). Academic Press Professional, Inc., San Diego, CA, USA.
  10. Ming, J., Hazen, T., Glass, J., and Reynolds, D. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5):1711-1723.
  11. Naik, J. (1990). Speaker Verification: A Tutorial. IEEE Communications Magazine, pages 42-48.
  12. Reynolds, D. and Rose, R. (1995). Robust text independent speaker identification using gaussian mixture speaker models. IEEE Trans. on Speech and Audio Processing, 3:72-82.
  13. Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17:91-108.
  14. Sant'Ana, R., Coelho, R., and Alcaim, A. (2006). TextIndependent Speaker Recognition Based on the Hurst Parameter and the Multidimensional Fractional Brownian Motion Model. IEEE Transactions on Audio, Speech and Language Processing, 14(3):931-940.
  15. Varga, A. and Steeneken, H. (1993). Assessment for automatic speech recognition ii: Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communications, 12(3):247-251.
  16. Veitch, D. and Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3):878 -897.
  17. Vetterli, M. and Kovacevic, J. (1995). Wavelets and subband coding. Englewood Cliffs: Prentice-Hall.
  18. Wu, D. (2009). Parameter Estimation for a -GMM Based on Maximum Likelihood Criterion. Neural Computation, 21(6):1776-1795.
  19. Wu, D., Li, J., and Wu, H. (2009). a -Gaussian Mixture Modelling for Speaker Recognition. Pattern Recognition Letters, 30(6):589-594.
  20. Za˜o, L. and Coelho, R. (2011). Colored noise based multicondition training technique for robust speaker identification. IEEE Signal Processing Letters, 18(11):675- 678.
Download


Paper Citation


in Harvard Style

Zão L. and Coelho R. (2012). NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 137-143. DOI: 10.5220/0003890501370143


in Bibtex Style

@conference{biosignals12,
author={L. Zão and R. Coelho},
title={NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)},
year={2012},
pages={137-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003890501370143},
isbn={978-989-8425-89-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)
TI - NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING
SN - 978-989-8425-89-8
AU - Zão L.
AU - Coelho R.
PY - 2012
SP - 137
EP - 143
DO - 10.5220/0003890501370143