voice source could lead to enhanced naturalness
and intelligibility.
• Expressive Voice. User-friendliness is one of the
most important demand from the industry. Since
expressivity is mainly managed by the source,
an emotional voice synthesis engine should take
into account realistic glottal source model param-
eters. Techniques presented in this paper could
be used to estimate these parameters on speech
samples extracted from an expressivity-oriented
speech database.
• Pathological Speech Analysis. Speech patholo-
gies are most of the time due to the irregular be-
haviour of the vocal folds during phonation. This
irregular vibration can be induced by nodules or
polyps on the folds and should result in irregular
values of model parameters. Methods here pre-
sented could hence be used to estimate the glottal
source and its features on pathological speech in
order to quantify the pathology level.
ACKNOWLEDGEMENTS
Thomas Drugman is supported by the “Fonds Na-
tional de la Recherche Scientifique” (FNRS) and
Nicolas D’Alessandro by the FRIA fundings. The au-
thors also would like to thank the Walloon Region for
its support (ECLIPSE WALEO II grant #516009 and
IRMA RESEAUX II grant #415911).
REFERENCES
Airas, M. (2008). TKK Aparat: An environment for voice
inverse filtering and parameterization, volume 33,
pages 49–64. Logopedics Phoniatrics Vocology.
Alku, P., Svec, J., Vilkman, E., and Sram, F. (1992). Glottal
wave analysis with pitch synchronous iterative adap-
tive inverse filtering. Speech Communication, 11(2-
3):109–117.
Alku, P., Svec, J., Vilkman, E., and Sram, F. (2000). Analy-
sis of voice in breathy, normal and pressed phonation
by comparing inverse filtering and videokymography.
In ICSLP 2000, Proceedings of the International Con-
ference on Spoken Language Processing, pages 885–
888.
Aparat (2008). Tkk aparat main page. http://aparat.
sourceforge.net/index.php/Main_Page.
Bozkurt, B., Couvreur, L., and Dutoit, T. (2007). Chirp
group delay analysis of speech signals. Speech Com-
munication, 49(3):159–176.
Bozkurt, B., Doval, B., and Dutoit, T. (2004). A method
for glottal formant frequency estimation. In Proc. IC-
SLP, International Conference on Spoken Language
Processing, Jeju Island (Korea).
Doval, B., d’Alessandro, C., and Henrich, N. (2003). The
voice source as a causal/anticausal linear filter. In Pro-
ceedings ISCA ITRW VOQUAL03, Geneva, Switzer-
land.
El-Jaroudi, A. and Makhoul, J. (1991). Discrete all-pole
modeling. IEEE Transactions on signal processing,
39(2):411–423.
Fant, G., Liljencrants, J., and Lin, Q. (1985). A four-
parameter model of glottal flow. In STL-QPSR4, pages
1–13.
Kawahara, H., Atake, Y., and Zolfaghari, P. (2000). Ac-
curate vocal event detection method based on a fixed-
point analysis of mapping from time to weighted av-
erage group delay. In ICSLP 2000, Proceedings of the
International Conference on Spoken Language Pro-
cessing, volume 4, pages 664–667.
Paliwal, K. and Atal, B. (1993). Efficient vector quantiza-
tion of lpc parameters at 24 bits/frame. IEEE Trans.
Speech Audio Processing, 1(1):3–14.
Sturmel, N., D’Alessandro, C., and Doval, B. (2007). A
comparative evaluation of the zeros of z transform rep-
resentation for voice source estimation. In INTER-
SPEECH 2007, Antwerp, Belgium, pages 558–561.
Tokuda, K., Zen, H., and Black, A. (2002). An hmm-based
speech synthesis system applied to english. In Proc.
IEEE Workshop on Speech Synthesis 02, Santa Mon-
ica, USA, pages 227–230.
GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation
Techniques
207