cooperative MOGA in the sense of classification
performance.
Due to the usage of the two-criterion optimization
model, we managed to find MLP structures which
were also effective in terms of computational
complexity. They not only had fewer neurons in the
hidden layer, but also the activation functions
required less computational time. These classifiers
might be especially effective if it is necessary to
make predictions in real time.
At the same time we increased the F-score values
significantly with the application of MLP ensembles.
For all of the databases MLP ensembles with or
without the feature selection procedure demonstrated
the best results. The relative improvement of F-score
values compared with the effectiveness of the
conventional MLP was equal to: 2.56% for
Emo-DB,
4.15% for SAVEE, 4.25% for LEGO, and 4.05% for
UUDB.
Moreover, it is important to note that the usage of
the embedded feature selection procedure allowed us
to simplify MLP structures and decrease their
computational complexity significantly.
5 CONCLUSIONS
In this paper we proposed the two-criterion
optimization model to design MLP classifiers
automatically for the speech-based emotion
recognition problem. The main benefit of this
approach is the opportunity to generate effective
MLP structures taking into consideration two
objectives
‘classification performance’ and
‘computational complexity’.
In the experiments conducted it was revealed that
this technique allowed us to design MLP classifiers
with simpler structures, whose accuracy was
comparable with (or even higher than) the
performance of conventional MLPs containing more
neurons in the hidden layer.
In the framework of this technique, it is also
possible to design ensembles of classifiers; their
application leads to the essential improvement of the
classification quality.
The binary representation of MLP structures
permitted us to embed the feature selection procedure
and additionally to simplify classifiers.
Finally, there are some other questions related to
the human-machine communication sphere. The
proposed scheme might be applied without any
changes to the speech-based speaker identification
problem as well as to speaker gender or age
recognition.
ACKNOWLEDGEMENTS
Research is performed with the financial support of
the Ministry of Education and Science of the Russian
Federation within the federal R&D programme
(project RFMEFI57414X0037).
REFERENCES
Boersma, P., 2002. Praat, a system for doing phonetics by
computer. Glot international, 5(9/10), pp. 341–345.
Brester, Ch., Sidorov, M., Semenkin, E., 2014. Speech-
based emotion recognition: Application of collective
decision making concepts. Proceedings of the 2nd
International Conference on Computer Science and
Artificial Intelligence (ICCSAI2014), pp. 216-220.
Brester, Ch., Semenkin, E., 2015a. Cooperative multi-
objective genetic algorithm with parallel
implementation. ICSI-CCI 2015, Part I, LNCS 9140,
pp. 471–478.
Brester, Ch., Semenkin, E., Sidorov, M., Kovalev, I.,
Zelenkov, P., 2015b. Evolutionary feature selection for
emotion recognition in multilingual speech analysis //
Proceedings of IEEE Congress on Evolutionary
Computation (CEC2015), pp. 2406–2411.
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.
F., and Weiss, B., 2005. A database of german
emotional speech. In Interspeech, pp. 1517–1520.
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A
fast and elitist multiobjective genetic algorithm:
NSGA-II. IEEE Transactions on Evolutionary
Computation 6 (2), pp. 182-197.
Eyben, F., Wöllmer, M., and Schuller, B., 2010.
Opensmile: the Munich versatile and fast opensource
audio feature extractor. In Proceedings of the
international conference on Multimedia, pp. 1459–
1462. ACM.
Goutte, C., Gaussier, E. 2005. A probabilistic
interpretation of precision, recall and F-score, with
implication for evaluation. ECIR'05 Proceedings of
the 27th European conference on Advances in
Information Retrieval Research, pp. 345–359.
Hall, M., Frank, E., Holmes, G., Pfahringer, B.,
Reutemann, P., Witten, I. H. The WEKA Data Mining
Software: An Update. SIGKDD Explorations, Volume
11, Issue 1.
Haq, S., Jackson, P., 2010. Machine Audition: Principles,
Algorithms and Systems, chapter Multimodal Emotion
Recognition. IGI Global, Hershey PA, pp. 398–423.
Mori, H., Satake, T., Nakamura, M., and Kasuya, H.,
2011. Constructing a spoken dialogue corpus for
studying paralinguistic information in expressive
conversation and analyzing its statistical/acoustic
characteristics, Speech Communication, 53.
Picard, R.W., 1995. Affective computing. Tech. Rep.
Perceptual Computing Section Technical Report No.
MulticriteriaNeuralNetworkDesignintheSpeech-basedEmotionRecognitionProblem
627