reduction: YES
CWT smoothing: YES, Kohonen network neuron
reduction: NO
CWT smoothing: YES, Kohonen network neuron
reduction: YES
The results are presented in Table 3.
Table 3: Automatic prolongation recognition results.
1. CWT smoothing – NO, Kohonen reduction – NO
A P B
sensitivity predictability
286 154 8
54% 95%
2. CWT smoothing – NO, Kohonen reduction – YES
A P B
sensitivity predictability
286 179 12
63% 94%
3. CWT smoothing – YES, Kohonen reduction – NO
A P B
sensitivity predictability
286 213 44
74% 83%
4. CWT smoothing – YES, Kohonen reduction – YES
A P B
sensitivity predictability
286 231 48
81% 83%
7 CONCLUSIONS
As we can see, the CWT smoothing algorithm
increased the recognition ratio from 54% (set 1.) to
74% (set 3.) and from 63% (set 2.) to 81% (set 4.). It
gives us 20% and 18% – that is very good. The
CWT algorithm is too precise – especially in the hi-
frequency scales. For the prolongation finding
purpose such precision it is too high – that is why
smoothing gives such a good ratio improvement.
The disadvantage of this method is that it decreases
predictability by 12% (set 1 and 3) and 11% (set 2
and 4), therefore it needs to be improved.
The Kohonen neuron reduction algorithm is a
little more difficult to interpret. We used such an
algorithm because if a prolongation is long (e.g.
1000ms), then consecutive vectors within that period
are similar to each other – they represent one
phoneme. We observed that when we want to have
only one winning neuron for that phoneme, then
other neurons have to compete for other phonemes,
therefore
A) all input vectors representing the prolongation
have to be almost identical, then no matter how
big the network, only one neuron will win the
competition for the prolongation phoneme or
B) an input signal needs to have a few other pho-
nemes in the window (proportional to the size of
the network) – then every neuron in the network
will compete for a different phoneme
When a given prolongation is short, then usually
there are a few phonemes within the window,
therefore condition B is met, which is good. But if
the prolongation is long, or there is a lot of silence,
then there are not many phonemes in the window.
The Kohonen neurons start to compete for the same
phoneme and they specialize in different variations
of one phoneme, which is bad because we want only
one neuron to win. That is why the CWT smoothing
algorithm works so well – it makes similar vectors
even more similar to each other and then condition A
is met. And that is why the ‘neuron reduction’
algorithm is working – it reduces excessive neurons
(condition B is met).
As we can see, ‘neuron reduction’ increases the
recognition ratio by 9% (set 1 and 2) and by 7% (set
3 and 4), which is significant. What is important is
that it does not reduce predictability – it means that
the algorithm improves recognition without
increasing the number of algorithm mistakes.
The results are promising. Now the main issues
are:
to improve the CWT smoothing algorithm – it
decreases predictability ratio too much
to investigate the Kohonen learning parameters
for better prolongation recognition such as: the
Kohonen network size, Kohonen network
learning and neighbour coefficients, Kohonen
network neuron initialization method, wavelet
scales and scale shifts, decibel threshold. The
research into the parameters is in progress.
REFERENCES
Akansu A.N, Haddad R. A., 2001, Multiresolution signal
decomposition, Academic Press.
Barro S., Marin R., 2002, Fuzzy Logic in Medicine,
Physica-Verlag Heidenberg, New York
Codello I., Kuniszyk-Jóźkowiak W., 2007, Wavelet
analysis of speech signal, Annales UMCS Informatica,
2007, AI 6, Pages 103-115.
Codello I., Kuniszyk-Jóźkowiak W., Kobus A., 2010,
Kohonen network application in speech analysis
algorithm, Annales UMCS Informatica, (Accepted
paper).
Garfield, S., M. Elshaw, and S. Wermter. 2001, Self-
orgazizing networks for classification learning from
normal and aphasic speech. In The 23rd Conference of
the Cognitive Science Society. Edinburgh, Scotland
Gold, B., Morgan, N., 2000. Speech and audio signal
PROLONGATION RECOGNITION IN DISORDERED SPEECH USING CWT AND KOHONEN NETWORK
397