Text-Dependent Speaker Identification using Spectrograms based on Conditional Quantization

Tridibesh Dutta

2008

Abstract

The goal of this paper is to study a new approach to text dependent speaker identification using spectrograms. This, mainly, revolves around trapping the complex patterns of variation in frequency and amplitude with time while an individual utters a given word through spectrogram segmentation. These optimally segmented spectrograms are used as a database to successfully identify the unknown individual from his/her voice. The methodology used for identifying, rely on classification of spectrograms (of speech signals), based on template matching of the conditionally quantized frequency-time domain features of the database spectrogram samples and the unknown speech sample. Performance of this novel approach on a sample collected from 40 speakers show that this methodology can be effectively used to produce a desirable success rate.

References

  1. Olsson J.: Text Dependent Speaker Verification with a Hybrid HMM/ANN System. Thesis Project, downloadable at http://www.speech.kth.se/prod/publications/files/1630.pdf.
  2. Jain Anik K., Duin Robert P. W. and Jianchang M.: Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 22, Issue 1(January 2000), pp. 4-37, 2000.
  3. Soong F.K., Rosenberg A.E., Juang B.H. and Rabiner L.R.: A vector quantization approach to speaker recognition. AT & T Technical Journal, 66:14-26, pp. 1987.
  4. Reynolds D. A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Commun. 17 (1995), pp. 91-108.
  5. Dutta T. and Krishna Basak G.: Text dependent speaker identification using similar patterns in spectrograms. PRIP'2007 Proceedings, Volume 1, pp. 87-92, Minsk, 2007.
  6. Demidenko E.: Kolmogorov-Smirnov image comparison. Lecture Notes Comp Sci 3056: 933- 938, 2004.
  7. Dutta T.: Text dependent speaker identification based on spectrograms. Accepted paper in The Twenty Second International Image and Vision Computing New Zealand (IVNCZ 2007) to be held at Hamilton, New Zealand, December 5-7, 2007.
  8. Duda R. O., Hart P. E. and Stork D. G.: Pattern Classification. John Wiley and Sons, 2006.
  9. Hastie T., Tibshirani R. and Friedman J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2001.
  10. Webb R. A.: Statistical Pattern Recognition. John Wiley and Sons, 2002.
  11. Gupta H., Hautamki V., Kinnunen T. and Frnti P.: Field Evaluation of TextDependent Speaker Recognition in an Access Control Application. Paper, downloadable at http://cs.joensuu.fi/pages/pums/public results/DTWpaper.pdf.
Download


Paper Citation


in Harvard Style

Dutta T. (2008). Text-Dependent Speaker Identification using Spectrograms based on Conditional Quantization . In Proceedings of the 1st International Workshop on Image Mining Theory and Applications IMTA 2008 - Volume 1: IMTA, (VISIGRAPP 2008) ISBN 978-989-8111-25-8, pages 133-142. DOI: 10.5220/0002338301330142


in Bibtex Style

@conference{imta08,
author={Tridibesh Dutta},
title={Text-Dependent Speaker Identification using Spectrograms based on Conditional Quantization},
booktitle={Proceedings of the 1st International Workshop on Image Mining Theory and Applications IMTA 2008 - Volume 1: IMTA, (VISIGRAPP 2008)},
year={2008},
pages={133-142},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002338301330142},
isbn={978-989-8111-25-8},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 1st International Workshop on Image Mining Theory and Applications IMTA 2008 - Volume 1: IMTA, (VISIGRAPP 2008)
TI - Text-Dependent Speaker Identification using Spectrograms based on Conditional Quantization
SN - 978-989-8111-25-8
AU - Dutta T.
PY - 2008
SP - 133
EP - 142
DO - 10.5220/0002338301330142