Figure 6: How to have the information of the Formants
and bandwidths using Praat.
4 CONCLUSIONS AND FUTURE
DEVELOPMENTS
This paper described the basic mechanism of human
speech production and the engineering models used
to develop a TTS system. The main objectives of the
blocks were explained. The acoustic module
methods were also referred and a special attention
was taken to the formant model because it was the
model used in this development.
A didactic acoustic module based in the Formant
model was developed for the purpose of
demonstration of the self Formant model. The
application fulfils its purpose and the synthesis
results with quality enough for understanding, once
only a single vowel or a sequence of vowels are
reproduced in this version. The application allows
the synthesis of any speech sound because the user
can select the vowel or the formant and bandwidth
parameters. The user can also experiment different
types of source excitation, from a sampled glottal
wave, to a synthetic glottal wave between sinusoidal,
triangular or rectangular wave formats.
An evolution of this version is under
development in the way of a speech synthesizer. At
this moment the acoustic module is build lacking the
complete development of the formant and
bandwidths of diphones database. This version will
allow the user to insert the phoneme sequences to be
reproduced.
REFERENCES
Barbosa P., Bailly G. (1994). Characterisation of rhythmic
patterns for text-to-speech synthesis, in Speech
Communication, 15: 127-137.
Barros, M. J., (2002). "Estudo Comparativo e Técnicas de
Geração de sinal para Síntese de Fala ". Master
dissertation, Faculdade de Engenharia da Universidade
do Porto.
Boersma, Paul and Weenink, David. Praat: doing
phonetics by computer. Phonetic Sciences, University
of Amsterdam. http://www.fon.hum.uva.nl/praat/
Fujisaki, H.. (1983). Dynamic characteristics of voice
fundamental frequency in speech and singing. In
MacNeilage. In P. F., Editor. The Production of
Speech, pages 39-55. Springer-Verlag.
Hirst, D. and Di Cristo, A.. (1998). Intonation Systems – A
Survey of Twenty Languages. Cambridge University
Press.
Klatt, DH (1987). Review of text-to-speech conversion for
English - Journal of the Acoustical Society of
America, 82 (3) - 1987. Pages 737-793.
Pierrehumbert, J. B. (1980). The Phonology and Phonetics
of English Intonation. PhD thesis, Massachusetts
Institute of Technology.
Saraswathi, S., (2010). Design of Multilingual Speech
Synthesis System. Academic journal article from
Intelligent Information Management, Vol. 2, No. 1.
Sproat, Richard W. (1997). Multilingual Text-to-Speech
Synthesis: The Bell Labs Approach. Springer.
Taylor, P. (2000). Analysis and Synthesis of Intonation
using the Tilt Model. Journal of the Acoustical Society
of America. vol 1073, pp. 1697-1714.
Teixeira, J. P. (2012). Prosody Generation Model for TTS
Systems - Segmental Durations and F0 Contours with
Fujisaki Model. LAP LAMBERT Academic
Publishing ISBN-13: 978-3-659-16277-0.
Teixeira, J. P., (1995). "Modelização Paramétrica de
Sinais para Aplicação em Sistemas de Conversão
Texto-Fala." Master Dissertation, FEUP – Porto.
Teixeira, J. P.,Barros, M. J. and Freitas, D., (2003).
"Sistemas de Conversão Texto-Fala." Procedings of
CLME, Maputo.
DidacticSpeechSynthesizer:AcousticModule-FormantsModel
359