evolving part of the signal can be obtained by simply
subtracting the corresponding SEW from the CW.
3 A NOVEL WI DECODER
SCHEME
In the conventional WI decoder used in the
communication areas, the received parameters are
the LP coefficients, the pitch value, the power of the
CW, the SEW and REW magnitude spectrum. The
decoder can obtain a continuous CW surface by
interpolating the successive SEW and REW and then
recombining them. After performing the power de-
normalization and subsequent realignment, the two-
dimensional CW surfaces are converted back into
the one-dimensional residual signal using a CW and
a pitch value at every sample point, which can be
obtained by linear interpolation. This conversion
process also requires the phase track estimated from
the pitch value at each sample point. The
reconstructed one-dimensional residual signal is
used to excite the linear predictive synthesis filter to
obtain the final output speech signal.
However, for the decoder used in the TTS
system, the final reconstructed signals are obtained
by decoding the segmented frames and
concatenating them. Therefore, if the medium part of
the concatenated speech is decoded by using the
conventional decoder, the final reconstructed speech
will be seriously distorted especially at the
concatenation boundary. For the segmented frame
decoding, if the decoder can use the previous
parameters at the start frame, the distortion due to
mentioned above can be dramatically decreased.
Therefore, in this paper we present the novel
decoder scheme for the segmented frame decoding
to reduce the distortion at the concatenation
boundary by utilizing the previous parameters. The
block diagram of the novel decoding scheme is
shown in Fig. 2.
As shown in Fig. 2, in order to decrease the
distortion for the segmented frame decoding, the
decoder utilizes all the previous parameters; the
(n-1) frame’s LSFs, pitch, CW power, SEW and
REW magnitudes, where the current frame number
is n. In the decoder, the (n-1) frame’s CWs are
required for processing the initial state at the start
frame. Since the CWs are generated by combining
the SEWs and REWs, both the (n-2) frame’s SEWs
and REWs are necessary to generate the (n-1)
frame’s CWs. For the continuous frame decoding,
the previous CWs are always available at the time of
the current frame processing since they are
preserved in the decoder during the decoding
procedure. However, for the segmented frame
decoding, since the previous CWs are not available
at the start frame, the decoder should make the
previous CWs using the (n-1) and (n-2) frame’s
SEWs and REWs.
Figure 2: A novel WI decoding scheme.
In addition to these five previous parameters, the
phase instant is also exploited in the proposed
decoding scheme for the segmented frame decoding.
The phase instant is calculated at the phase
estimation. The phase instant is used for obtaining
the one-dimensional residual signal from the two-
dimensional characteristic waveform. During the
procedure the phase instant at every sample is
calculated and the last one is stored for the next
frame processing. In the proposed scheme, the
previous phase instant at the start frame is also
exploited with the five previous parameters
mentioned above. Adopting the phase instant for the
segmented frame decoding dramatically improves
the performance of the final reconstructed speech
segments.
4 EXPERIMENTAL RESULTS
The waveforms in the Fig. 3 are generated by
decoding the segmented frames and then
concatenating two reconstructed speech signals. In
this figure, the solid vertical line indicates the
concatenation boundary. Waveform (a) is generated
by the continuous frame decoding; hence this
waveform can be considered as the original
waveform compared to the waveforms generated by
the segmented frame decoding.
WI
Decoder
n
th
frame’s
- LSPs
-Pitch
-Power
-SEW Mag.
-REW Mag.
Pre-processing
for
SEW & REW
Pre-processing
for CW
(n-1)
th
frame’s
- LSPs
-Pitch
-Power
(n-1)
th
SEW Mag.
(n-2)
th
frame’s
SEW Mag. = 0
REW Mag. = 0
(n-1)
th
REW Mag.
(n-1)
th
CW
(n-1)
th
SEW
(n-1)
th
REW
Reconstructed
Speech
(n-1)
th
frame’s
last phase
A NOVEL WI DECODER FOR THE SEGMENTED FRAME DECODING IN THE TEXT-TO-SPEECH SYNTHESIZER
153