Combining Spectral and Prosodic Features in HMM-based Single Utterance Speaker Verification

Osman Büyük; Levent M. Arslan

doi:10.5220/0005217500860091

Combining Spectral and Prosodic Features in HMM-based Single Utterance Speaker Verification

Osman Büyük, Levent M. Arslan

2015

Abstract

In this paper, we combine spectral and prosodic features together in order to improve the verification performance on a text-dependent single utterance speaker verification task. The baseline spectral system makes use of a whole-phrase sentence HMM topology for the fixed utterance. We extract prosodic features using time alignment information obtained from the HMM states. In our experiments we observe that, although the prosodic features individually do not yield high performance, they provide complementary information to the spectral features. We achieve approximately 10% relative reduction in EER when the information sources are combined with a multi-layer neural network.

References

Auckenthaler, R., Carey, M., Lloyd-Thomas, H., 2000. “Score normalization for text-independent speaker verification systems,” Digital Signal Processing 10 (1- 3), pp. 42-54.
Charlet, D., Jouvet, D., Collin, O., 2000. “An alternative normalization scheme in HMM-based text-dependent speaker verification,” Speech Communication 31 (2- 3), pp. 113-120.
Dehak, N., Dumouchel, P., Kenny, P., 2007. “Modeling prosodic features with joint factor analysis for speaker verification,” IEEE Transactions on Audio, Speech and Language Processing 15 (7), pp. 2095-2103.
Ferrer, L., Scheffer, N., Shriberg, E., 2010. “A comparison of approaches for modeling prosodic features in speaker recognition,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010).
Klusacek, D., Navratil, J., Reynolds, D., Campbell, J., 2003. “Conditional pronunciation modeling in speaker detection,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003).
NIST, 2012. “National Institute of Standards and Technology. Speaker Recognition Evaluation,” http://www.nist.gov/speech/tests/spk.
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, D., Xiang, B., 2003. “The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003).
Shriberg, E., Ferrer, L., Kajarekar, S., Venkataraman, A., Stolcke, A., 2005. “Modeling prosodic feature sequences for speaker recognition,” Speech Communication 46 (3-4), pp. 455-472.
Talkin, D., 1995. “A robust algorithm for pitch tracking (RAPT)”, Speech Coding and Synthesis edited by W. B. Kleijn, K.K. Paliwal (Elsevier, New York), pp. 495-518.
Weber, F., Manganaro, L., Peskin, B., Shriberg, E., 2002. ”Using prosodic and lexical information for speaker identification,” International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002).
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., Gupta, C.S., 2005. “Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,” IEEE Transactions on Speech and Audio Processing 13 (4), pp. 575-582.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006. The HTK Book (for HTK Version 3.4), Cambridge University Engineering Department.

Download

Paper Citation

in Harvard Style

Büyük O. and M. Arslan L. (2015). Combining Spectral and Prosodic Features in HMM-based Single Utterance Speaker Verification . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2015) ISBN 978-989-758-069-7, pages 86-91. DOI: 10.5220/0005217500860091

in Bibtex Style

@conference{biosignals15,
author={Osman Büyük and Levent M. Arslan},
title={Combining Spectral and Prosodic Features in HMM-based Single Utterance Speaker Verification},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2015)},
year={2015},
pages={86-91},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005217500860091},
isbn={978-989-758-069-7},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2015)
TI - Combining Spectral and Prosodic Features in HMM-based Single Utterance Speaker Verification
SN - 978-989-758-069-7
AU - Büyük O.
AU - M. Arslan L.
PY - 2015
SP - 86
EP - 91
DO - 10.5220/0005217500860091