Authors:
Fraihat Salam
and
Glotin Hervé
Affiliation:
Information and System Sciences Lab - UMR 6168, USTV, France
Keyword(s):
Speech analysis, Quantization, Time-frequency, Allen Temporal Algebra, Automatic Speech Recognition.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Audio and Speech Processing
;
Digital Signal Processing
;
Multimedia
;
Multimedia Signal Processing
;
Pattern Recognition
;
Software Engineering
;
Telecommunications
Abstract:
Speech dynamics may not well be addressed by the conventional speech processing. We analyse here a new quantization paradigm for vowel coding. It is based on simple Allen temporal interval algebra applied on subband voicing levels, yielding to a compressed speech representation of only 21 integers for a speech window up to 32 ms long. Experiments show that we take advantage of the ranking of the average values of the voicing interval accross the various subbands. Theses new features are evaluated for vowel recognition (1 hour, 6 vowels) on a referenced multispeaker radio broadcast news used during evaluation campaign ESTER. We work on the subset of the most frequent french vowels. We get 62% class error rate adding the ranking information to the Allen’s relations, instead of 70% using Allen relations alone, and 57% the set of the raw 48 floats. We then discuss on the advantage of using more subbands, and we finaly propose a strategy to tackle the combinatorial complexity of Allen rel
ations.
(More)