and previous LPC vectors. This approach not only
reduces the LPC coding rate but also produces high
coded speech quality.
2 CLASSICAL LPC VECTOR
QUANTIZATION
2.1 General-purpose Vector
Quantization
Most of the recent speech coder standards use one of
the various vector quantization algorithms to code the
spectral information. In contrast to scalar quantiza-
tion, VQ techniques reduce the coding rate in spite of
an increase in search computation. The performance
of a VQ method is function of the size of the code-
book. A codebook with more entries certainly ex-
cels in modeling the spectral parameters. However,
this improvement in speech quality is achieved to the
detriment of an increase in coding rate and computa-
tional complexity.
For example, in the G.729 narrowband codec stan-
dard, a combination of Multistage VQ and Split VQ
(MSVQ) is used to determine which 10-dimensional
LSF vector (among all the LSF codebook entries) cor-
responds most closely to the current frame LSF vec-
tor. In the first stage of the search procedure, a 7-
bit codebook is searched for the closest match to the
difference between the input and predicted LSF vec-
tors, while in the second stage two codebooksof 5 bits
each are examined, for a total coding rate of 1.8 kbit/s.
In the G.722.2 wideband coding standard (Bessete, et
al., 2002), the same VQ technique, with slight modi-
fications, is employed to code 16 ISF coefficients. A
total of 46 bits per 20-ms frame is allocated to coding
the input ISF vector for all the codec modes, except
for the 6.60 kbit/s coder which searches the closed
codewords among 832 entries (for a bit rate of 1.8
kbit/s). These bit allocations ignore the acoustical
characteristics of voiced and unvoiced speech (Tamni,
et al., 2005). The above vector quantization algo-
rithms can be categorized as general-purpose quan-
tization techniques since they are applied jointly to
both voiced and unvoiced speech.
2.2 Shortcomings of Classical Vector
Quantization Techniques
To code efficiently the LPC coefficients, one must
employ separate codebooks for voiced and unvoiced
frames. In both G.729 and G.722.2 standards, the er-
ror between the current and past frame LPC vectors
is quantized using a combination of split and multi-
stage vector quantization. While the amplitude of this
error is too small in voiced speech, its magnitude and
dynamic range for unvoiced speech are significantly
higher. Figure 1 shows the squared error between
consecutive ISF vectors in a wideband speech signal.
Each of the G.729 and G.722.2 codecs quantize this
error with the same quantizer for both voiced and un-
voiced frames. This approach is certainly inefficient
since it dos not exploit the high interframe correlation
of the voiced speech spectrum. The variable-rate mul-
timode wideband speech codec (Jelinek and Salami,
2007), which is based on a source-controlled coding
paradigm, utilizes separate coding modes for differ-
ent classes of speech. However, the spectral informa-
tion is encoded using the same quantizer in all coding
modes.
An obvious and better approach consists of quantiz-
ing the voiced and unvoiced spectrum information
separately. The source-controlled quantization of the
spectrum parameters will evidently provide higher
coding performance. In (Guerchi, 2007) the inter-
frame correlation of spectrum parameters is used to
reduce the computational complexity by almost 30 %
while maintaining the coding rate fixed. An alterna-
tive method of exploiting the high interframe correla-
tion in voiced speech consists of using a smaller-size
quantizer for this class of speech signal.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
Time (in samples)
Amplitude
(a)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 10
4
0
0.2
0.4
0.6
0.8
Time (in samples)
Squared error
(b)
Figure 1: (a) Wideband speech signal; (b) ISF squared error
between consecutive frames.
3 BIMODAL VECTOR
QUANTIZATION
In this section, we introduce the bimodal vector quan-
tization (BMVQ) technique. This technique, which
consists of two disjoint ISF codebooks, reduces the
SIGMAP 2008 - International Conference on Signal Processing and Multimedia Applications
152