PERSON VERIFICATION BY FUSION OF PROSODIC, VOICE SPECTRAL AND FACIAL PARAMETERS

Javier Hernando, Mireia Farrús, Pascual Ejarque, Ainara Garde, Jordi Luque

2006

Abstract

Prosodic information can be used successfully for automatic speaker recognition, although most of the speaker recognition systems use only short-term spectral features as voice information. In this work, prosody information is added to a multimodal system based on face and voice characteristics in order to improve the performance of the system. Fusion is carried out by using various fusion strategies and two different fusion techniques: support vector machines and matcher weighting. Results are clearly improved when a previous normalization based on histogram equalization is done before the fusion of the monomodal scores.

References

  1. Atal, B. S. (1972). "Automatic speaker recognition based on pitch contours." Journal of the Acoustical Society of America 52: 1687-1697.
  2. Belhumeur, P. N., J. P. Hespanha, et al. (1997). "Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection." IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7): 711- 720.
  3. Bolle, R. M., J. H. Connell, et al. (2004). Guide to Biometrics. New York, Springer.
  4. Carey, M. J., E. S. Parris, et al. (1996). Robust prosodic features for speaker identification. ICSLP, Philadelphia.
  5. Cristianini, N. and J. Shawe-Taylor (2000). An introduction to support vector machines (and other kernel-based learning methods), Cambridge University Press.
  6. Godfrey, J. J., E. C. Holliman, et al. (1990). Switchboard: Telephone speech corpus for research and development. ICASSP.
  7. Hearst, M. A. (1998). "Trends and Controversies: Support Vector Machines." IEEE Intelligent Systems 13: 18- 28.
  8. Indovina, M., U. Uludag, et al. (2003). Multimodal Biometric Authentication Methods: A COTS Approach. MMUA, Workshop on Multimodal User Authentication, Santa Barbara, CA.
  9. Lee, D. D. and H. S. Seung (2001). Algorithms for nonnegative matrix factorization. Advances in Neural Information Processing Systems: Proceedings of the 2000 Conference, MIT Press.
  10. Luque, J., R. Morros, et al. (2006). Audio, video and multimodal person identification in a smart room. CLEAR 06 Workshop, Southampton.
  11. Lüttin, J., G. Maître, et al. (1998). Evaluation Protocol for the Extended M2VTS Database (XM2VTSDB). Martigny, Switzerland, IDIAP.
  12. Nadeu, C., J. B. Mariño, et al. (1996). Frequency and time-filtering of filter-bank energies for HMM speech recognition. ICSLP.
  13. Peskin, B., J. Navratil, et al. (2003). Using prosodic and conversational features for high-performance speaker recognition: Report from JHU WS'02. ICASSP.
  14. Torre, Á. d. l., A. M. Peinado, et al. (2005). "Histogram Equalization of Speech Representation for Robust Speech Recognition." IEEE Transactions on Speech and Audio Processing 13(3): 355-366.
  15. Zafeiriou, S., A. Tefas, et al. (2005). Discriminant NMFfaces for frontal face verification. IEEE International Workshop on Machine Learning for Signal Processing, Mystic, Connecticut.
Download


Paper Citation


in Harvard Style

Hernando J., Farrús M., Ejarque P., Garde A. and Luque J. (2006). PERSON VERIFICATION BY FUSION OF PROSODIC, VOICE SPECTRAL AND FACIAL PARAMETERS . In Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2006) ISBN 978-972-8865-63-4, pages 17-23. DOI: 10.5220/0002105200170023


in Bibtex Style

@conference{secrypt06,
author={Javier Hernando and Mireia Farrús and Pascual Ejarque and Ainara Garde and Jordi Luque},
title={PERSON VERIFICATION BY FUSION OF PROSODIC, VOICE SPECTRAL AND FACIAL PARAMETERS},
booktitle={Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2006)},
year={2006},
pages={17-23},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002105200170023},
isbn={978-972-8865-63-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Security and Cryptography - Volume 1: SECRYPT, (ICETE 2006)
TI - PERSON VERIFICATION BY FUSION OF PROSODIC, VOICE SPECTRAL AND FACIAL PARAMETERS
SN - 978-972-8865-63-4
AU - Hernando J.
AU - Farrús M.
AU - Ejarque P.
AU - Garde A.
AU - Luque J.
PY - 2006
SP - 17
EP - 23
DO - 10.5220/0002105200170023