A HMM based classification system can be
decomposed into two classical phases, the training
phase and the testing phase. The database is
therefore split into a training database and a testing
database. Both phases rely on a prosodic analyzing
step which consists in transforming the temporal
signal of word ‘oui’ into a sequence of vectors
which components are prosodic features. The
description of the prosodic features is given in the
next section.
During the training phase, the system learns
occurrences of the training database. The result is
HMMs that represent classes through their prosodic
vector sequences (‘using HRest’ command of the
HTK library).
During the testing phase, the sequence of
prosodic vectors of an unknown occurrence is
proposed to the classifier (‘using Hvite’). The
classification decision is made taking the highest
probability between the two classes.
This classification system has been experimented
on a self made database constituted of a relatively
small number of occurrences. Thus, the validation of
such a system could face the curse of dimensionality
problem (a performance decrease with an increase of
the number of prosodic components) (Jain, et al.,
2000). We therefore propose a feature selection step
that will be introduced in section 2.3.
2.2 Definition of the Prosodic Feature
Vector
Typical features that characterize prosody can be the
pitch f
0
(Hz) and the energy E (dB). Thanks to
PRAAT software (Boersma & Weenink, 2014),
these parameters are computed every 10 ms on 30
ms analyzing windows of the temporal signal
corresponding to an occurrence of the word ‘oui’. A
dynamic description of these static parameters f
0
and
E is added by computing differential parameters of
first and second order ∆ and ∆∆ using HTK library.
Thus, each occurrence of the word ‘oui’ is
represented by a sequence of vectors with 6 prosodic
components noted as E, f
0
, ∆E , ∆f
0
, ∆∆E, ∆∆f
0.
The issue of the HMM based classifier is to
make a decision for assigning the use of the word
‘oui’ in a predefined class, from a sequence of
vectors composed of 6 prosodic parameters.
In our application, the HMM structures
associated to the classes ‘conviction’ and ‘lack of
conviction’ are composed of 5 states per class (and
one state more for input and output) with mixture of
3 Gaussians per state.
The quality of the classification system is
evaluated by a classification rate defined as:
where N is the total number of occurrences given at
the input of the classifier and S is the number of
misclassified occurrences.
2.3 The Feature Selection Problem
Dimensionality reduction of the feature vectors can
be achieved using features selection algorithms
which select a subset of relevant feature from an
initial set of features. These algorithms can be
grouped into approaches that are classifier
dependent (‘wrapper’ methods) and classifier
independent (‘filter’ methods). Despite the wrapper
methods have the disadvantage of a considerable
computational expense; they have higher learning
capacity in terms of over fitting (Brown, et al.,
2012). So, in our work, we adopt wrapper methods
because the disadvantage is minimized using only 6
features as initial set in the features selection. In
particular, we use a forward sequential algorithm in
which we select one feature at each step of selection.
Moreover, the small size of the database makes the
algorithm computationally tractable.
3 EXPERIMENTS AND RESULTS
3.1 Database Elaboration
In order to test the feasability of categorization of
word’s uses based on prosodic features, a small oral
corpus has been created inspired from questions that
can be asked in real opinion polls. The motivation
for the construction of this database was to rapidly
collect many instances of the word ‘oui’ thanks to
questions leading to pronouncing the word ‘oui’
with expression of ‘conviction’ or ‘lack of
conviction’. The questionnaire is composed of 4
series with 10 questions each. Each series tackles
more and more polemic topics (personal phone use,
sport, European Union and politics). A group of 8
women and 17 men, all French native speakers,
answered to this questionary. They were fully
informed about the experimental procedures and all
gave their signed consent.
It was difficult to label all the occurences of the
word ‘oui’ in the dichotomy ‘conviction’ and ‘lack
of conviction’, either because the conviction issue
was not at stake, or because the word ‘oui’
expressed another feeling (pride, lassitude…). A
ProsodybasedAutomaticClassificationoftheUsesofFrench'Oui'asConvincedorUnconvincedUses
351