GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques
Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D’Alessandro, Thierry Dutoit
2008
Abstract
This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the fundamental frequency and the first formant on the performance are evaluated. Our proposed approach shows significant improvement in robustness, which could be of a great interest when decomposing real speech.
References
- Airas, M. (2008). TKK Aparat: An environment for voice inverse filtering and parameterization, volume 33, pages 49-64. Logopedics Phoniatrics Vocology.
- Alku, P., Svec, J., Vilkman, E., and Sram, F. (1992). Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication, 11(2- 3):109-117.
- Alku, P., Svec, J., Vilkman, E., and Sram, F. (2000). Analysis of voice in breathy, normal and pressed phonation by comparing inverse filtering and videokymography. In ICSLP 2000, Proceedings of the International Conference on Spoken Language Processing, pages 885- 888.
- Aparat (2008). Tkk aparat main page. http://aparat. sourceforge.net/index.php/Main_Page.
- Bozkurt, B., Couvreur, L., and Dutoit, T. (2007). Chirp group delay analysis of speech signals. Speech Communication, 49(3):159-176.
- Bozkurt, B., Doval, B., and Dutoit, T. (2004). A method for glottal formant frequency estimation. In Proc. ICSLP, International Conference on Spoken Language Processing, Jeju Island (Korea).
- Doval, B., d'Alessandro, C., and Henrich, N. (2003). The voice source as a causal/anticausal linear filter. In Proceedings ISCA ITRW VOQUAL03, Geneva, Switzerland.
- El-Jaroudi, A. and Makhoul, J. (1991). Discrete all-pole modeling. IEEE Transactions on signal processing, 39(2):411-423.
- Fant, G., Liljencrants, J., and Lin, Q. (1985). A fourparameter model of glottal flow. In STL-QPSR4, pages 1-13.
- Kawahara, H., Atake, Y., and Zolfaghari, P. (2000). Accurate vocal event detection method based on a fixedpoint analysis of mapping from time to weighted average group delay. In ICSLP 2000, Proceedings of the International Conference on Spoken Language Processing, volume 4, pages 664-667.
- Paliwal, K. and Atal, B. (1993). Efficient vector quantization of lpc parameters at 24 bits/frame. IEEE Trans. Speech Audio Processing, 1(1):3-14.
- Sturmel, N., D'Alessandro, C., and Doval, B. (2007). A comparative evaluation of the zeros of z transform representation for voice source estimation. In INTERSPEECH 2007, Antwerp, Belgium, pages 558-561.
- Tokuda, K., Zen, H., and Black, A. (2002). An hmm-based speech synthesis system applied to english. In Proc. IEEE Workshop on Speech Synthesis 02, Santa Monica, USA, pages 227-230.
Paper Citation
in Harvard Style
Drugman T., Dubuisson T., Moinet A., D’Alessandro N. and Dutoit T. (2008). GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques . In Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008) ISBN 978-989-8111-60-9, pages 202-207. DOI: 10.5220/0001936702020207
in Bibtex Style
@conference{sigmap08,
author={Thomas Drugman and Thomas Dubuisson and Alexis Moinet and Nicolas D’Alessandro and Thierry Dutoit},
title={GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques},
booktitle={Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)},
year={2008},
pages={202-207},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001936702020207},
isbn={978-989-8111-60-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the International Conference on Signal Processing and Multimedia Applications - Volume 1: SIGMAP, (ICETE 2008)
TI - GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques
SN - 978-989-8111-60-9
AU - Drugman T.
AU - Dubuisson T.
AU - Moinet A.
AU - D’Alessandro N.
AU - Dutoit T.
PY - 2008
SP - 202
EP - 207
DO - 10.5220/0001936702020207