STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH
Ańıbal Ferreira
2008
Abstract
Vowel recognition is frequently based on Linear Prediction (LP) analysis and formant estimation techniques. However, the performance of these techniques decreases in the case of female or child speech because at high pitch frequencies (F0) the magnitude spectrum is scarcely sampled making formant estimation unreliable. In this paper we describe the implementation of a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. PSC based features were evaluated in automatic recognition tests using the Mahalanobis distance and using a data base of five natural Portuguese vowel sounds uttered by 44 speakers, 27 of whom are child speakers. LP based features and Mel-Frequency Cepstral Coefficients (MFCC) were also included in the tests as a reference. Results show that while the recognition performance of PSC features falls between that of LP based features and that of MFCC coefficients, the normalization of PSC features by F0 increases the performance and approaches that of MFCC coefficients. PSC features are not only amenable to a psychophysical interpretation (as LP based features are) but have also the potential to compete with global shape features such as MFCCs.
References
- Cheveigné, A. and Kawahara, H. (1999). Missing-data model of vowel identification. Journal of the Acoustical Society of America, 105(6):3497-3508.
- Chistovich, L. and Lublinskaja, V. (1979). The center of gravity effect in vowel spectra and critical distance between the formants: psychoacoustical study of perception of vowel-like stimuli. In Hearing Research, volume 1, pages 185-195.
- Fant, G. (1970). Acoustic Theory of Speech Production. The Hague.
- Ferreira, A. J. S. (2005). New signal features for robust identification of isolated vowels. In 9th European Conference on Speech Communication and Techology (Interspeech-2005), pages 345-348.
- Ferreira, A. J. S. (2007). Static features in real-time recognition of isolated vowels at high pitch. Journal of the Acoustical Society of America, 112(4):2389-2404.
- Hess, W. (1983). Pitch Determination of Speech Signals -algorithms and devices. Springer-Verlag.
- Klatt, D. H. (1982). Prediction of perceived phonetic distance from critical-band spectra - a first step. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1278-1281.
- Mollis, M. R. (2005). Evaluating models of vowel perception. Journal of the Acoustical Society of America, 118(2):1062-1071.
- Moore, B. C. J. (1989). An Introduction to the Psychology of Hearing. Academic Press.
- Rabiner, L. and Juang, B.-H. (1993). Fundamentals of Speech Recognition. Prentice-Hall, Inc.
- Zahorian, S. A. and Jagharghi, A. J. (1993). Spectralshape features versus formants as acoustic correlates for vowels. Journal of the Acoustical Society of America, 94(4):1966-1982.
Paper Citation
in Harvard Style
Ferreira A. (2008). STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 63-68. DOI: 10.5220/0001934500630068
in Bibtex Style
@conference{sigmap08,
author={Ańıbal Ferreira},
title={STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={63-68},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001934500630068},
isbn={978-989-8111-60-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH
SN - 978-989-8111-60-9
AU - Ferreira A.
PY - 2008
SP - 63
EP - 68
DO - 10.5220/0001934500630068