Table 1: Mean Opinion Score (mean ± standard devia-
tion) of 12 listeners assessing the quality of 3 different re-
stored voices. Applied MOS-scale: highly improved-1, no
improvement-3, highly degraded-5.
Method Improved Feature
Prosody Breathiness
LT 2.9 ± 0.7 1.7 ± 0.8
LT+MT 1.9 ± 0.7 1.6 ± 0.5
MR 2.0 ± 1.2 2.0 ± 1.2
shaped noise instead of the AWGN may reduce this
undesired effect of the short-time pitch variability.
5 CONCLUSIONS
We presented a device for the restoration of authen-
tic features in pathological voices. We have shown
that the different methods utilized by the device can
improve the prosody and breathiness of pathologi-
cal voices to a different extent. Clearly, the study
is limited by the small number of listeners and the
small number of signals that the methods were ap-
plied to. Another limitation is the requirement of a
relatively well developed pathological voice. In or-
der to make the technology available for patholog-
ical speakers with less developed voices, additional
signals have to be employed in future investigations.
Nevertheless, the principal capability of the multi-
resolution voice restoration device has been shown.
REFERENCES
Arora, R. and Sethares, W. A. (2007). Adaptive wavetable
oscillators. IEEE Trans. on Signal Processing, 55
(9):4382–4392.
Bi, N. and Qi, Y. (1997). Application of speech conversion
to alaryngeal speech enhancement. IEEE Transac-
tions on Speech and Audio Processing, 5(2):97–105.
Brockmann, M., Storck, C., Carding, P., and Drinnan, M.
(2008). Voice loudness and gender effects on jitter
and shimmer in healthy adults. Journal of Speech,
Language and Hearing Research, 51:1152–1160.
del Pozo, A. and Young, S. (2006). Continuous tracheoe-
sophageal speech repair. EUSIPCO.
Fant, G. (1981). The source filter concept in voice produc-
tion. STL-QPSR, 22:21–37.
Gerhard, D. (2003). Pitch extraction and fundamental fre-
quency: History and current techniques. Technical re-
port, University of Regina, CA.
Haykin, S. (2001). Adaptive Filter Theory. Prentice Hall.
Kasuya, H., Ogawa, S., Kikuchi, Y., and Ebihara, S. (1986).
An acoustic analysis of pathological voice and its
application to the evaluation of laryngeal pathology.
Speech Communication, 5 (2):171–181.
Mitev, P. and Hadjitodorov, S. (2003). Fundamental fre-
quency estimation of voice of patients with laryngeal
disorders. Information Sciences, 156 (1-2):3–19.
Moerman, M., Pieters, G., Martens, J., van der Borgt, M.,
and Dejonckere, P. (2004). Objective evaluation of
quality of substitution voices. Eur Arch Otorhino-
laryngol, 261:541–547.
Most, T., Tobin, Y., and Mimran, R. (2000). Acoustic and
perceptual characteristics of esophageal and tracheoe-
sophageal speech production. Journal of Communica-
tion Disorders, 33(2):165–180.
Murakami, T. and Ishida, Y. (2001). Fundamental frequency
estimation of speech signals using music algorithm.
Acoust. Sci. Technol., 22 (4):293–297.
Pindzola, R. and Cain, B. (1988). Acceptability ratings of
tracheoesophageal speech. Laryngoscope, 98(4):394–
397.
Qi, Y., Weinberg, B., and Bi, N. (1995). Enhancement of
female esophageal and tracheoesophageal speech. J.
Acoust. Soc. of America, 98(5 Pt 1):2461–2465.
Rosenberg, A. and Hirschberg, J. (2006). On the correla-
tion between energy and pitch accent in read english
speech. Interspeech, 1294-Mon2A3O.2.
Schleusing, O., Vetter, R., Renevey, P., Krauss, J., Reale, F.,
Schweizer, V., and Vesin, J.-M. (2009). Restoration
of authentic features in tracheoesophageal speech by
a multi-resolution approach. Proc. of SPPRA 2009,
pages 643–042.
The Mathworks (2006). Matlab 2006b.
Turin, G. L. (1960). An introduction to matched filters. IRE
Transactions on Information Theory, 6 (3):311–329.
Un, C. and Yang, S. (1977). A pitch extraction algorithm
based on lpc inverse filtering. IEEE Trans. ASSP,
25:378–389.
van As, C. (2001). Tracheoesophageal Speech: A multi-
dimensional assessment of voice quality. PhD thesis,
University of Amsterdam.
Verma, A. and Kumar, A. (2005). Introducing roughness in
individuality transformation through jitter modelling
and modification. ICASSP, 1:5–8.
Vetter, R., Cornuz, J., Vuadens, P., Sola, I., and Renevey,
P. (2006). Method and system for converting voice.
European Patent. EP1710788.
Weinberg, B. (1986). Laryngectomee Rehabilitation, chap-
ter Acoustical properties of esophageal and tracheoe-
sophageal speech, pages 113–127. College-Hill Press,
San Diego, CA.
Williams, S. and Barber Watson, J. (1987). Speaking pro-
ficiency variations according to method of alaryngeal
voicing. Laryngoscope, 97(6):737–739.
DEVICE FOR PROSODIC SPEECH RESTORATION - A Multi-Resolution Approach for Glottal Excitation Restoration
43