NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING
L. Zão, R. Coelho
2012
Abstract
This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and pH features, combined with the multicondition training (MT) technique based on artificial colored spectra noises, for noise robust speaker verification. The a-integrated Gaussian mixture models (a-GMM), an extension of the conventional GMM, are used in the speaker verification experiments. Five real acoustic noises are used to corrupt the speech signals in different signal-to-noise ratios (SNR) for tests. The experiments results show that the use of MFCC + pH feature vectors improves the accuracy of speaker verification systems based on single MFCC. It is also shown that the speaker verification system with the MFCC + pH fusion and the a-GMM with the MT technique achieves the best performance for the speaker verification task in noisy environments.
References
- Al-Alaoui, M. (1993). Novel digital integrator and differentiator. Electronics Letters, 29(4):376-378.
- Bimbot, F., Bonastre, J. F., Fredouille, C., Gravier, G., Chagnolleau, M. I., Meignier, S., Merlin, T., Garcia, O. J., Delacretaz, P., and Reynolds (2004). A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Applied Signal Processing, 4:430-451.
- Boll, S. (1979). Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27:113- 120.
- Campbell, J., Shen, W., Campbell, W., Schwartz, R., Bonastre, J.-F., and Matrouf, D. (2009). Forensic Speaker Recognition. IEEE Signal Processing Magazine, 26:95-103.
- Cooke, M., Green, P., Josifovski, L., and Vizinho, A. (2001). Robust Automatic Speech Recognition with Missing and Unreliable Acoustic Data. Speech Communication, 34:267-285.
- Daubechies, I. (1992). Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia, USA.
- Davis, S. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(4):357-366.
- Fisher, W. M., Doddington, G. R., and Goudie-Marshall, K. M. (1986). The DARPA Speech Recognition Research Database: Specifications and Status. Proceedings of DARPA Workshop on Speech Recognition, pages 93-99.
- Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition (2nd ed.). Academic Press Professional, Inc., San Diego, CA, USA.
- Ming, J., Hazen, T., Glass, J., and Reynolds, D. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5):1711-1723.
- Naik, J. (1990). Speaker Verification: A Tutorial. IEEE Communications Magazine, pages 42-48.
- Reynolds, D. and Rose, R. (1995). Robust text independent speaker identification using gaussian mixture speaker models. IEEE Trans. on Speech and Audio Processing, 3:72-82.
- Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17:91-108.
- Sant'Ana, R., Coelho, R., and Alcaim, A. (2006). TextIndependent Speaker Recognition Based on the Hurst Parameter and the Multidimensional Fractional Brownian Motion Model. IEEE Transactions on Audio, Speech and Language Processing, 14(3):931-940.
- Varga, A. and Steeneken, H. (1993). Assessment for automatic speech recognition ii: Noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communications, 12(3):247-251.
- Veitch, D. and Abry, P. (1999). A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory, 45(3):878 -897.
- Vetterli, M. and Kovacevic, J. (1995). Wavelets and subband coding. Englewood Cliffs: Prentice-Hall.
- Wu, D. (2009). Parameter Estimation for a -GMM Based on Maximum Likelihood Criterion. Neural Computation, 21(6):1776-1795.
- Wu, D., Li, J., and Wu, H. (2009). a -Gaussian Mixture Modelling for Speaker Recognition. Pattern Recognition Letters, 30(6):589-594.
- Za˜o, L. and Coelho, R. (2011). Colored noise based multicondition training technique for robust speaker identification. IEEE Signal Processing Letters, 18(11):675- 678.
Paper Citation
in Harvard Style
Zão L. and Coelho R. (2012). NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012) ISBN 978-989-8425-89-8, pages 137-143. DOI: 10.5220/0003890501370143
in Bibtex Style
@conference{biosignals12,
author={L. Zão and R. Coelho},
title={NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)},
year={2012},
pages={137-143},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0003890501370143},
isbn={978-989-8425-89-8},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2012)
TI - NOISE ROBUST SPEAKER VERIFICATION BASED ON THE MFCC AND pH FEATURES FUSION AND MULTICONDITION TRAINING
SN - 978-989-8425-89-8
AU - Zão L.
AU - Coelho R.
PY - 2012
SP - 137
EP - 143
DO - 10.5220/0003890501370143