Voice Passwords Revisited
Chenguang Yang, Ghaith Hammouri, Berk Sunar
2012
Abstract
We demonstrate an attack on basic voice authentication technologies. Specifically, we show how one member of a voice database can manipulate his voice in order to gain access to resources by impersonating another member in the same database. The attack targets a voice authentication system build around parallel and independent speech recognition and speaker verification modules and assumes that adapted Gaussian Mixture Model (GMM) is used to model basic Mel-frequency cepstral coefficients (MFCC) features of speakers. We experimentally verify our attack using the YOHO database. The experiments conclude that in a database of 138 users an attacker can impersonate anyone in the database with a 98% success probability after at most nine authorization attempts. The attack still succeeds, albeit at lower success rates, if fewer attempts are permitted. The attack is quite practical and highlights the limited amount of entropy that can be extracted from the human voice when using MFCC features.
References
- Bimbot, F., Bonastre, J., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-García, J., Petrovska-Delacrétaz, D., and Reynolds, D. (2004). A tutorial on text-independent speaker verification. EURASIP Journal on Applied Signal Processing, 4:430-451.
- Brummer, N. and Strasheim, A. (2009). AGNITIO's Speaker Recognition System for EVALITA 2009. In The 11th Conference of the Italian Association for Artificial Intelligence.
- Davis, S. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech and Signal Processing, IEEE Transactions on, 28(4):357-366.
- Ellis, D. P. W. (2005). PLP and RASTA (and MFCC, and inversion) in Matlab. www.ee.columbia.edu/~dpwe/ resources/matlab/rastamat.
- Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of the SPECOM, volume 1, pages 191-194. Citeseer.
- Heck, L. and Mirghafori, N. (2000). On-line unsupervised adaptation in speaker verification. In Sixth International Conference on Spoken Language Processing.
- Higgins, A., Porter, J., and Bahler, L. (1989). YOHO speaker authentication final report. ITT Defense Communications Division.
- Kenny, P., Boulianne, G., and Dumouchel, P. (2005). Eigenvoice modeling with sparse training data. Speech and Audio Processing, IEEE Transactions on, 13(3):345- 354.
- Kenny, P., Boulianne, G., Ouellet, P., and Dumouchel, P. (2007). Joint factor analysis versus eigenchannels in speaker recognition. Audio, Speech, and Language Processing, IEEE Transactions on, 15(4):1435-1447.
- Kenny, P., Ouellet, P., Dehak, N., Gupta, V., and Dumouchel, P. (2008). A study of interspeaker variability in speaker verification. Audio, Speech, and Language Processing, IEEE Transactions on, 16(5):980-988.
- Kinnunen, T. (2003). Spectral features for automatic textindependent speaker recognition. Licentiatesthesis, Department of computer science, University of Joensuu.
- Krause, N. and Gazit, R. (2006). SVM-based Speaker Classification in the GMM Models Space. In Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The, pages 1-5. IEEE.
- Marco F, H., Tim, B., Hugh, D.-W., and Uwe D., H. (2008). On entropy approximation for Gaussian mixture random vectors. In IEEE International Conference on In Multisensor Fusion and Integration for Intelligent Systems.
- Microsoft Corporation (2011). System.speech programming guide for .net framework 4.0. Microsoft Developer Network (MSDN).
- Miller, D. and Top, D. (2010). Voice biometrics 2010: A transformative year for voice-based authentication.
- Mirghafori, N. and Heck, L. (2002). An adaptive speaker verification system with speaker dependent a priori decision thresholds. In Seventh International Conference on Spoken Language Processing.
- Monrose, F., Reiter, M., Li, Q., and Wetzel, S. (2001a). Cryptographic key generation from voice. sp, page 0202.
- Monrose, F., Reiter, M., Li, Q., and Wetzel, S. (2001b). Using voice to generate cryptographic keys. In 2001: A Speaker Odyssey-The Speaker Recognition Workshop. Citeseer.
- Reynolds, D., Quatieri, T., and Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing, 10(1-3):19-41.
- Reynolds, D. and Rose, R. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on speech and audio processing, 3(1):72-83.
- Soong, F. and Rosenberg, A. (1988). On the use of instantaneous and transitional spectral information in speaker recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on, 36(6):871-879.
- Teunen, R., Shahshahani, B., and Heck, L. (2000). A model-based transformational approach to robust speaker recognition. In Sixth International Conference on Spoken Language Processing.
- Vergin, R., O'Shaughnessy, D., and Gupta, V. (1996). Compensated mel frequency cepstrum coefficients. In ICASSP 7896: Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference, pages 323-326, Washington, DC, USA. IEEE Computer Society.
Paper Citation
in Harvard Style
Yang C., Hammouri G. and Sunar B. (2012). Voice Passwords Revisited . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012) ISBN 978-989-8565-24-2, pages 163-171. DOI: 10.5220/0004060201630171
in Bibtex Style
@conference{secrypt12,
author={Chenguang Yang and Ghaith Hammouri and Berk Sunar},
title={Voice Passwords Revisited},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012)},
year={2012},
pages={163-171},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004060201630171},
isbn={978-989-8565-24-2},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2012)
TI - Voice Passwords Revisited
SN - 978-989-8565-24-2
AU - Yang C.
AU - Hammouri G.
AU - Sunar B.
PY - 2012
SP - 163
EP - 171
DO - 10.5220/0004060201630171