GENETIC OPTIMIZATION OF CEPSTRUM FILTERBANK FOR PHONEME CLASSIFICATION

Leandro D. Vignolo, Hugo L. Rufiner, Diego H. Milone, John C. Goddard

Abstract

Some of the most commonly used speech representations, such as mel-frequency cepstral coefficients, incorporate biologically inspired characteristics into artificial systems. Recent advances have been introduced modifying the shape and distribution of the traditional perceptually scaled filterbank, commonly used for feature extraction. Some alternatives to the classic mel scaled filterbank have been proposed, improving the phoneme recognition performance in adverse conditions. In this work we propose an evolutionary strategy as a way to find an optimal filterbank. Filter parameters such as the central and side frequencies are optimized. A hidden Markov model classifier is used for the evaluation of the fitness for each possible solution. Experiments where conducted using a set of phonemes taken from the TIMIT database with different additive noise levels. Classification results show that the method accomplishes the task of finding an optimized filterbank for phoneme recognition.

References

  1. Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J. (2007a). Complementary features for speaker verification based on genetic algorithms. In Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, volume 4, pages IV-285-IV-288.
  2. Charbuillet, C., Gas, B., Chetouani, M., and Zarader, J. (2007b). Multi Filter Bank Approach for Speaker Verification Based on Genetic Algorithm, pages 105-113.
  3. Davis, S. V. and Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, 28:57-366.
  4. Deller, J. R., Proakis, J. G., and Hansen, J. H. (1993). Discrete-Time Processing of Speech Signals. Macmillan Publishing, NewYork.
  5. Garofalo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., and Dahlgren, N. L. (1993). Darpa timit acousticphonetic continuous speech corpus cdrom. Technical report, U.S. Dept. of Commerce, NIST, Gaithersburg, MD.
  6. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Professional.
  7. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press.
  8. Huang, X. D., Ariki, Y., and Jack, M. A. (1990). Hidden Markov Models for Speech Recognition. Edinburgh University Press.
  9. Jankowski, C. R., Vo, H. D. H., and Lippmann, R. P. (1995). A comparison of signal processing front ends for automatic word recognition. IEEE Transactions on Speech and Audio Processing, 4(3):251-266.
  10. Jelinek, F. (1999). Statistical Methods for Speech Recognition. MIT Press, Cambrige, Masachussets.
  11. Lai, Y.-P., Siu, M., and B., M. (2006). Joint optimization of the frequency-domain and time-domain transformations in deriving generalized static and dynamic mfccs. Signal Processing Letters, IEEE, 13:707-710.
  12. Rabiner, L. and Juang, B.-H. (1993). Fundamentals of Speech Recognition. Prentice Hall PTR.
  13. Skowronski, M. and Harris, J. (2002). Increased mfcc filter bandwidth for noise-robust phoneme recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1:I-801-I-804.
  14. Skowronski, M. and Harris, J. (2003). Improving the filter bank of a classic speech feature extraction algorithm. In Proceedings of the 2003 International Symposium on Circuits and Systems (ISCAS), volume 4, pages IV281-IV-284.
  15. Skowronski, M. and Harris, J. (2004). Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition. The Journal of the Acoustical Society of America, 116(3):1774-1780.
  16. Stevens, K. N. (2000). Acoustic Phonetics. Mit Press.
  17. Tang, K., Man, K. F., Kwong, S., and He, Q. (1996). Genetic algorithms and their applications. IEEE Signal Processing, 13(6):22-29.
  18. Vignolo, L., Milone, D., Rufiner, H., and Albornoz, E. (2006). Parallel implementation for wavelet dictionary optimization applied to pattern recognition. In Proceedings of the 7th Argentine Symposium on Computing Technology, Mendoza, Argentina.
  19. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (2000). HMM Toolkit. Cambridge University.
Download


Paper Citation


in Harvard Style

D. Vignolo L., L. Rufiner H., H. Milone D. and C. Goddard J. (2009). GENETIC OPTIMIZATION OF CEPSTRUM FILTERBANK FOR PHONEME CLASSIFICATION . In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009) ISBN 978-989-8111-65-4, pages 179-185. DOI: 10.5220/0001552401790185


in Bibtex Style

@conference{biosignals09,
author={Leandro D. Vignolo and Hugo L. Rufiner and Diego H. Milone and John C. Goddard},
title={GENETIC OPTIMIZATION OF CEPSTRUM FILTERBANK FOR PHONEME CLASSIFICATION},
booktitle={Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)},
year={2009},
pages={179-185},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001552401790185},
isbn={978-989-8111-65-4},
}


in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Bio-inspired Systems and Signal Processing - Volume 1: BIOSIGNALS, (BIOSTEC 2009)
TI - GENETIC OPTIMIZATION OF CEPSTRUM FILTERBANK FOR PHONEME CLASSIFICATION
SN - 978-989-8111-65-4
AU - D. Vignolo L.
AU - L. Rufiner H.
AU - H. Milone D.
AU - C. Goddard J.
PY - 2009
SP - 179
EP - 185
DO - 10.5220/0001552401790185