Like in the dependent mode, the three selection
methods highlight the /w/ phoneme duration as a
relevant feature since all the methods firstly select
this feature. The maximum rate for the manual
segmentation (85.71% with JMI strategy and 8
features) is near from the rate previously achieved in
the dependant mode. As expected, a classification rate
decrease is observed in the independent mode with
automatic segmentation which however remains
acceptable when considering this system close from
an industrial system (maximum of 78.57% whatever
the feature selection method).
Nevertheless, these results remain partial ones since a
real functional system requires a large database which
was not available in our study.
5 CONCLUSION
The aim of this study was to evaluate the ability of
local or global prosodic features for explaining the
two clusters carried out by linguistic experts and for
classifying the word’s uses of “oui” in a real context
of spontaneous discourse. The word’s uses were
identified as belonging to the “convinced” or “lack of
conviction” class. The results showed that 10 features
are sufficient to fully explain both clusters CV and
NCV based on the 115 occurrences of a self made
corpus which have been labeled by linguistic experts;
the features having being selected thanks to the
MRMR filter selection strategy. The 10 relevant
selected features by this strategy are local for the most
part. All the results showed that the first relevant
feature was the /w/ phoneme duration. The system
was validated by building classification systems in a
speaker dependent mode and in a speaker
independent mode and by also investigating manual
phoneme segmentation and automatic phoneme
segmentation. In the case of speaker dependent mode
and manual phoneme segmentation, the rate reached
87.72%. The classification rate reached 78.57% in the
speaker independent mode with automatic phoneme
segmentation which is a system configuration close to
an industrial one. These results are partial and
preliminary ones regarding the size of the database.
However, these are promising for industrial
applications like automatic processing of large
database oral opinion polls.
ACKNOWLEDGMENT
This work is funded by the région Centre Val de
Loire, France. This collaborative work implies four
laboratories of the University of Orléans (LLL,
PRISME/IRAuS, LIFO, MAPMO). All the persons
involved in this project are acknowledged for their
active participation.
REFERENCES
Boersma, P. and Weenink, D., 2014. Praat: doing phonetics
by computer. [Online] (5.3.75) Available at:
www.praat.org [Accessed 2014].
Brown, G., Pocock, A., Zhao, M.-J. and Lujan, M., 2012.
Conditional Likelihood Maximisation: A Unifying
Framework for information theoretic feature selection.
Journal of Machine Learning Research, 13(1), pp.27-
66.
Cover, T. and Thomas, J., 1991. Elements of information
theory. New York: Wiley Series in telecommunica-
tions.
Hacine-Gharbi, A. et al., 2013. A new histogram-based
estimation technique of entropy and mutual informa-
tion using mean squared error minimization. Computers
and Electrical Engineering, 39(3), pp.918-33.
Hacine-Gharbi, A., Petit, M., Ravier, P. and Nemo, F.,
2015. Prosody Based Automatic Classification of the
Uses of French ‘oui’ as Convinced or Unconvinced
Uses. In 4th International Conference on Pattern
Recognition Applications and Methods (ICPRAM).
Lisboa, Portugal, 2015.
Jain, A., Duin, R. and Mao, J., 2000. Statistical pattern
recognition: a review. Trans. Pattern Analysis and
Machine Intelligence, 22, (1), pp.4-37.
Kompe, R., 1997. Prosody in Speech Understanding
Systems. LNAI.
Manganaro, L., Peskin, B. and Shriberg, E., 2002. Using
prosodic and lexical information for speaker. In
ICASSP., 2002.
Mary, L. and Yegnanarayana, B., 2008. Extraction and
representation of prosodic features for language and
and speaker recognition. Speech Communication,
pp.782–96.
Petit, M., 2009. Discrimination prosodique et
représentation du lexique: application aux emplois des
connecteurs discursifs. thesis. PhD Thesis, University
of Orléans.
Wang, C., 2001. Prosodic modelling for improved speech
recognition and understanding. PhD Thesis,
Massachussetts Institute of Technology.
Young, S., Kershaw, D., Odell, J. and Ollason, D., 1999.
The HTK Book. Cambridge: Entropic Ltd.