Healthy/Esophageal Speech Classification using Features
based on Speech Production and Audition Mechanisms
Sofia Ben Jebara
Lab. COSIM, Ecole Sup´erieure des Communications de Tunis, Carthage University
Route de Raoued 3.5 Km, Cit´e El Ghazala, Ariana 2088, Tunisia
Keywords:
Speech Production Mechanism, Perceptual Audition Process, Classification, Healthy/Esophageal Speech.
Abstract:
This paper focuses on the classification of speech sequences into two classes: healthy speech and esophageal
speech. Two kinds of features are selected: those based on speaker speech production mechanism and those
using listener auditory system properties. Two classification strategies are used: the Discriminant Analysis
and the GMM based bayesian classifier. Experiments, conducted with a large database, show classification
accuracy using both features. Moreover, auditory based features are the best since error rates tend to be null.
1 INTRODUCTION
Nowadays, a big importance is attached to the so-
cial integration of persons suffering from pathologies.
Particularly, recent research works are conducted in
order to allow alaryhngeal people, using esophageal
voice as substitution speech, to communicate through
fixed and mobile phones. In such situations, due to
the speech production process conducted by esoph-
agus extremity, esophageal voice is not clear and not
very intelligible. In order to improveits quality, a sim-
ple device to insert in the telephone equipment would
allow elevating and clarifying this voice. This equip-
ment would work when esophageal voice is in use and
will not be functional when healthy voice is spoken.
A system of classification healthy/esophageal speech
is then useful in order to attend this purpose. Hence,
the goal of this paper is to propose a useful solution
to make the decision whether the telephone spoken
speech is healthy or esophageal. Successful classifi-
cation will enable the automatic non-invasive device
to work.
The speech classification is mainly composed of
two important blocks which are the features extractor
and the decision module. The most commonly used
features for healthy speech analysis are zero crossing
rate, auto-correlation coefficients, speech peakness
and energy, wavelet based features, delta line spectral
frequencies (Atal and Rabiner, 1996; Childers et al.,
1989; ITU-T, 1996) which can be qualified as tem-
poral and spectral features. Some others such as Mel
Frequency Cepstral Coefficients are categoriz-
ed as perceptual features (Rabiner and Juang, 1993).
By the other side, the most commonly used fea-
tures for esophageal voice are Pitch, Jitter, Shimmer,
Harmonic to Noise Ratio (HNR), Normalized Noise
Energy (NNE), (Orlikoff, 2000; Kasuya and Ogawa,
1986),... which are called acoustic parameters.
In this paper, we propose the use of two kinds of
features, the first one is related to the hearing behavior
of the listener whereas the second one expresses the
speech production mechanism of the speaker. These
families of features are justified as follows: both
voices are heard by human listeners whose percep-
tual properties towards healthy and esophageal voices
are the same. Hence, the ear will be able to differ-
entiate the auditive quality of the two voices. On the
opposite side, the two voices are produced by two dif-
ferent mechanisms. Healthy speech is the result of an
excitation, filtered by the glottis, the vocal track and
the lips whereas the esophageal voice is presented as
the result of an excitation, filtered by the esophagus
extremity and the lips. So we expect that their pro-
duction mechanism models will be different and some
classical features well adapted to healthy speech will
fail when used to characterize esophageal speech.
The used features related to the audition mecha-
nism are the popular Mel Frequency Cepstral Coefi-
cients (MFCC) which are powerful for many speech
processing tasks such as recognition, fingerprinting,
indexing,.. Features related to speech production
mechanism are Linear Prediction Coherence Func-
tion features (LPCF) which have interesting prop-
erties for voice activity detection, voiced-unvoiced-
99
Ben Jebara S..
Healthy/Esophageal Speech Classification using Features based on Speech Production and Audition Mechanisms.
DOI: 10.5220/0004181500990104
In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2013), pages 99-104
ISBN: 978-989-8565-36-5
Copyright
c
2013 SCITEPRESS (Science and Technology Publications, Lda.)