
cooperative MOGA in the sense of classification 
performance.  
Due to the usage of the two-criterion optimization 
model, we managed to find MLP structures which 
were also effective in terms of computational 
complexity.  They not only had fewer neurons in the 
hidden layer, but also the activation functions 
required less computational time. These classifiers 
might be especially effective if it is necessary to 
make predictions in real time.  
At the same time we increased the F-score values 
significantly with the application of MLP ensembles. 
For all of the databases MLP ensembles with or 
without the feature selection procedure demonstrated 
the best results. The relative improvement of F-score 
values compared with the effectiveness of the 
conventional MLP was equal to: 2.56% for 
Emo-DB, 
4.15% for SAVEE, 4.25% for LEGO, and 4.05% for 
UUDB.  
Moreover, it is important to note that the usage of 
the embedded feature selection procedure allowed us 
to simplify MLP structures and decrease their 
computational complexity significantly. 
5  CONCLUSIONS 
In this paper we proposed the two-criterion 
optimization model to design MLP classifiers 
automatically for the speech-based emotion 
recognition problem. The main benefit of this 
approach is the opportunity to generate effective 
MLP structures taking into consideration two 
objectives 
‘classification performance’ and 
‘computational complexity’. 
  In the experiments conducted it was revealed that 
this technique allowed us to design MLP classifiers 
with simpler structures, whose accuracy was 
comparable with (or even higher than) the 
performance of conventional MLPs containing more 
neurons in the hidden layer. 
  In the framework of this technique, it is also 
possible to design ensembles of classifiers; their 
application leads to the essential improvement of the 
classification quality.  
  The binary representation of MLP structures 
permitted us to embed the feature selection procedure 
and additionally to simplify classifiers.  
Finally, there are some other questions related to 
the human-machine communication sphere. The 
proposed scheme might be applied without any 
changes to the speech-based speaker identification 
problem as well as to speaker gender or age 
recognition. 
ACKNOWLEDGEMENTS 
Research is performed with the financial support of 
the Ministry of Education and Science of the Russian 
Federation within the federal R&D programme 
(project RFMEFI57414X0037). 
REFERENCES 
Boersma, P., 2002. Praat, a system for doing phonetics by 
computer. Glot international, 5(9/10), pp. 341–345. 
Brester, Ch., Sidorov, M., Semenkin, E., 2014. Speech-
based emotion recognition: Application of collective 
decision making concepts. Proceedings of the 2nd 
International Conference on Computer Science and 
Artificial Intelligence (ICCSAI2014), pp. 216-220. 
Brester, Ch., Semenkin, E., 2015a. Cooperative multi-
objective genetic algorithm with parallel 
implementation.  ICSI-CCI 2015, Part I, LNCS 9140, 
pp. 471–478. 
Brester, Ch., Semenkin, E., Sidorov, M., Kovalev, I., 
Zelenkov, P., 2015b. Evolutionary feature selection for 
emotion recognition in multilingual speech analysis // 
Proceedings of IEEE Congress on Evolutionary 
Computation (CEC2015), pp. 2406–2411.  
Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. 
F., and Weiss, B., 2005. A database of german 
emotional speech. In Interspeech, pp. 1517–1520. 
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T., 2002. A 
fast and elitist multiobjective genetic algorithm: 
NSGA-II.  IEEE Transactions on Evolutionary 
Computation 6 (2), pp. 182-197. 
Eyben, F., Wöllmer, M., and Schuller, B., 2010. 
Opensmile: the Munich versatile and fast opensource 
audio feature extractor. In Proceedings of the 
international conference on Multimedia, pp. 1459–
1462. ACM. 
Goutte, C., Gaussier, E. 2005. A probabilistic 
interpretation of precision, recall and F-score, with 
implication for evaluation. ECIR'05 Proceedings of 
the 27th European conference on Advances in 
Information Retrieval Research, pp. 345–359. 
Hall, M., Frank, E., Holmes, G., Pfahringer, B., 
Reutemann, P., Witten, I. H. The WEKA Data Mining 
Software: An Update. SIGKDD Explorations, Volume 
11, Issue 1. 
Haq, S., Jackson, P., 2010. Machine Audition: Principles, 
Algorithms and Systems, chapter Multimodal Emotion 
Recognition. IGI Global, Hershey PA, pp. 398–423. 
Mori, H., Satake, T., Nakamura, M., and Kasuya, H., 
2011. Constructing a spoken dialogue corpus for 
studying paralinguistic information in expressive 
conversation and analyzing its statistical/acoustic 
characteristics, Speech Communication, 53. 
Picard, R.W., 1995. Affective computing. Tech. Rep. 
Perceptual Computing Section Technical Report No. 
MulticriteriaNeuralNetworkDesignintheSpeech-basedEmotionRecognitionProblem
627