in parentheses. The F-measure has been chosen as the
main criterion of the classification performance. The
columns entitled MLP Baseline contain results, which
were achieved with the baseline 384-dimensional fea-
ture set without feature selection. Similarly, the
columns titled PCA MLP and IGR MLP contain re-
sults obtained with PCA and IGR feature selection
procedures correspondingly. The columns titled GA
MLP show the classification performance attained
with the conventional genetic algorithm-based feature
selection (wrapper approach). Finally, the proposed
MOGA-based feature selection is shown in the SPEA
MLP columns.
6 CONCLUSION AND FUTURE
WORK
An application of the proposed hybrid system in order
to select the most representative features and maxi-
mize the accuracy of particular tasks could decrease
the number of features and increase the accuracy of
the system simultaneously. In most of the cases, the
MOGA-based technique outperforms baseline results.
It should be noted that the number of selected features
using the IGR method is quite high. It means that in
some cases the number of features was equal to 384,
i.e. an optimal modelling procedure has been con-
ducted without feature selection at all.
While MLP has already provided reasonable re-
sults for automated dialogue analysis, we are still ex-
amining its general appropriateness. The usage of
other possibly more accurate classifiers may improve
the performance of this system. Furthermore, dia-
logues do not only consist of speech, but also of a
visual representation. Hence, an analysis of pictures
or even video recordings may also improve the per-
formance of dialogue analysis.
REFERENCES
Batliner, A., Hacker, C., Steidl, S., N
¨
oth, E., D’Arcy, S.,
Russell, M. J., and Wong, M. (2004). ” you stupid
tin box”-children interacting with the aibo robot: A
cross-linguistic emotional speech corpus. In LREC.
Bijankhan, M., Sheikhzadegan, J., Roohani, M., Samareh,
Y., Lucas, C., and Tebyani, M. (1994). Farsdat-the
speech database of farsi spoken language. In the Pro-
ceedings of the Australian Conference on Speech Sci-
ence and Technology, volume 2, pages 826–830.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F.,
and Weiss, B. (2005). A database of german emotional
speech. In Interspeech, pages 1517–1520.
Carpenter, G. A., Grossberg, S., Markuzon, N., Reynolds,
J. H., and Rosen, D. B. (1992). Fuzzy artmap: A neu-
ral network architecture for incremental supervised
learning of analog multidimensional maps. Neural
Networks, IEEE Transactions on, 3(5):698–713.
Daridi, F., Kharma, N., and Salik, J. (2004). Parameter-
less genetic algorithms: review and innovation. IEEE
Canadian Review, (47):19–23.
Eyben, F., W
¨
ollmer, M., and Schuller, B. (2010). Opens-
mile: the munich versatile and fast open-source audio
feature extractor. In Proceedings of the international
conference on Multimedia, pages 1459–1462. ACM.
Gharavian, D., Sheikhan, M., Nazerieh, A., and Garoucy,
S. (2012). Speech emotion recognition using fcbf fea-
ture selection method and ga-optimized fuzzy artmap
neural network. Neural Computing and Applications,
21(8):2115–2126.
Grimm, M., Kroschel, K., and Narayanan, S. (2008). The
vera am mittag german audio-visual emotional speech
database. In Multimedia and Expo, 2008 IEEE Inter-
national Conference on, pages 865–868. IEEE.
Hansen, J. H., Bou-Ghazale, S. E., Sarikaya, R., and Pel-
lom, B. (1997). Getting started with susas: a speech
under simulated and actual stress database. In EU-
ROSPEECH, volume 97, pages 1743–46.
Haq, S. and Jackson, P. (2010). Machine Audition: Prin-
ciples, Algorithms and Systems, chapter Multimodal
Emotion Recognition, pages 398–423. IGI Global,
Hershey PA.
Kockmann, M., Burget, L., and
ˇ
Cernock
`
y, J. (2009). Brno
university of technology system for interspeech 2009
emotion challenge. In Tenth Annual Conference of the
International Speech Communication Association.
Kwon, O.-W., Chan, K., Hao, J., and Lee, T.-W. (2003).
Emotion recognition by speech signals. In INTER-
SPEECH.
Mori, H., Satake, T., Nakamura, M., and Kasuya, H. (2011).
Constructing a spoken dialogue corpus for studying
paralinguistic information in expressive conversation
and analyzing its statistical/acoustic characteristics.
Speech Communication, 53.
Polzehl, T., Schmitt, A., and Metze, F. (2011). Salient
features for anger recognition in german and english
ivr portals. In Minker, W., Lee, G. G., Nakamura,
S., and Mariani, J., editors, Spoken Dialogue Systems
Technology and Design, pages 83–105. Springer New
York. 10.1007/978-1-4419-7934-6
4.
Potter, M. A. and De Jong, K. A. (1994). A coopera-
tive coevolutionary approach to function optimization.
In Parallel Problem Solving from Nature–PPSN III,
pages 249–257. Springer.
Schmitt, A., Ultes, S., and Minker, W. (2012). A parame-
terized and annotated corpus of the cmu let’s go bus
information system. In International Conference on
Language Resources and Evaluation (LREC).
Zitzler, E. and Thiele, L. (1999). Multiobjective evolu-
tionary algorithms: A comparative case study and the
strength pareto approach. Evolutionary Computation,
IEEE Transactions on, 3(4):257–271.
SpeakerStateRecognitionwithNeuralNetwork-basedClassificationandSelf-adaptiveHeuristicFeatureSelection
703