5 Discussion and Conclusions
Through the present paper it has been shown that formant-based speech processing
may be carried out by well-known bio-inspired computing units. Special emphasis has
been placed in the description of the biophysical mechanisms which are credited for
being responsible of formant dynamics detection, as related to the perception of
vowel-like (static or quasi static) and consonant-like sounds (strongly dynamic). A
special effort has been devoted to the definition of a plausible neuromorphic or bio-
inspired architecture composed of multiple moduli of a general purpose computing
unit. The use of such units in vowel and consonantal formant dynamics
characterization as positive and negative frequency tracking and grouping has also
been presented. The structures studied correspond roughly to the processing centres in
the Olivar Nucleus and the Inferior Colliculus. The systemic bottom-up building of
layered structures reproducing dynamic feature detection related to plausible neuronal
circuits in the Auditory Cortex has also been introduced. Results from simulations
explaining the behaviour of these layered structures have been presented as well,
confirming that robust formant trackers built from simple Hebbian units may carry
out important tasks in Speech Processing eventually related with the perception of
dynamic consonants. The utility of this methodology is to be found in the automatic
phonetic labelling of the speech trace, as shown in this study, as well as in typical
tasks related with Cognitive Audio Processing [13].
Acknowledgements
This work has been funded by grants TIC2003-08756, TEC2006-12887-C02-01/02
and TEC2009-14123-C04-03 from Plan Nacional de I+D+i, Ministry of Science and
Technology, by grant CCG06-UPM/TIC-0028 from CAM/UPM, and by project
HESPERIA (http.//www.proyecto-hesperia.org) from the Programme CENIT, Centro
para el Desarrollo Tecnológico Industrial, Ministry of Industry, Spain.
References
1. Allen, J. B., 2008. Nonlinear Cochlear Signal Processing and Masking in Speech
Perception. In Springer Handbook of Speech Processing (Chapter 3), Eds.: J. Benesty, M.
M. Sondhi and Y. Huang, Springer Verlag, pp. 27-60.
2. IPA: http://www.arts.gla.ac.uk/IPA/ipachart.html
3. Geissler, D. B and Ehret, G., 2002. Time-critical integration of formants for perception of
communication calls in mice. Proc. of the Nat. Ac. of Sc. 99-13 pp. 9021-9025.
4. Gómez, P., Ferrández, J. M., Rodellar, V., Fernández, R., 2009a. Time-frequency
Representations in Speech Perception, Neurocomputing 72 820-830.
Gómez, P., Ferrández, J. M., Rodellar, V., Álvarez, A., Mazaira, L. M., Martínez, R.,
2009b. Detection of Speech Dynamics by Neuromorphic Units. Lecture Notes on Computer
Science 5602, Springer Verlag pp. 67-78.
26