SEARCHING FOR A ROBUST MFCC-BASED
PARAMETERIZATION FOR ASR APPLICATION
J. V. Psutka, Luboš Šmídl and Aleš Pražák
Department of Cybernetics, University of West Bohemia, Pilsen, Czech Republic
Keywords: MFCC parameterization, critical band-pass filters, robust front-end.
Abstract: The paper concerns with searching for areas of robust setting a MFCC-based parameterization as regards
numbers of band-pass filters and computed coefficients. Settings that are theoretically recommended for
telephone and microphone speech are compared with a large number of experimental results and a new
technique for determination of robust areas of {<# of band-pass filters>×<# of coefficients>} is designed.
1 INTRODUCTION
The state of the art parameterization techniques used
in ASR systems try to model the process of human
hearing. In speech processing terminology these
techniques are known as MFCC (Zheng and Song,
2001) and PLP parameterizations. It is well known
that both these techniques attempt to accommodate
the parameter estimation process to the way of
human hearing and how human perceive sounds
with various frequencies. However, one question
that we have to deal with is a selection of an
"optimal" number of critical band-pass filters and a
number of computed coefficients. In papers
published in many prestige world conferences we
usually find nearly always the same settings without
necessary analysis of the task conditions and
reference e.g. to the used sampling frequency of
speech signal (perhaps it is influenced by the default
setting the software tool HTK, which is frequently
used at many research labs). On the other hand, from
the relatively rich experience of building many ASR
systems we known that there isn't only one universal
setting which would yield for given "quality" of
speech signal the most successful results of
recognition experiments. Experimental results
however indicate that the best classification results
create in the space {<number of band-pass filters> ×
<number of coefficients>} certain areas in which the
successfulness is high and it doesn't change too
much (i.e. it doesn't dependent on the change of the
number of critical band-filters and the number of
coefficients). The goal of described works is to find
settings (i.e. the number of filters and derived
coefficients), which correspond to the best
recognition results and then for such solutions to
specify "areas of robust setting".
The whole work is done with the MFCC
parameterization and for speech data of telephone
(F
v
=8 kHz) and microphone (F
v
=44.1 kHz) quality.
2 MFCC BASED PROCESSING
The computational algorithm of the MFCC
parameterization is realized by the bank of
symmetric overlapping triangular filters spaced
linearly in a mel-frequency axis, according to
auditory perceptual considerations. The spacing as
well as bandwidth of the particular filters is
determined by a critical-band concept. To execute
this process we have to perform following steps:
•Computation of short-term speech spectrum.
•Non-linear frequency transformation and critical-
band spectral resolution – triangular band-pass
filters in a mel-frequency axis.
Table 1: Recommended numbers of filters for different
values of sampling frequency.
• Computation of cepstral coefficients.
• Applying an inverse discrete Fourier transform.
Sampling
frequency
F
v
[kHz]
Band
width
[kHz]
Band
width
[mell]
Number
of filters
M
8 0÷4 0÷2146 15
16 0÷8 0÷2840 20
44.1 0÷22 0÷3921 27
196
V. Psutka J., Šmídl L. and Pražák A. (2007).
SEARCHING FOR A ROBUST MFCC-BASED PARAMETERIZATION FOR ASR APPLICATION.
In Proceedings of the Second International Conference on Signal Processing and Multimedia Applications, pages 192-195
DOI: 10.5220/0002140401920195
Copyright
c
SciTePress