Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem

Maxim Sidorov, Evgenii Sopov, Ilia Ivanov, Wolfgang Minker

Abstract

The speech-based emotion recognition problem has already been investigated by many authors, and reasonable results have been achieved. This article focuses on applying audio-visual data fusion approach to emotion recognition. Two state-of-the-art classification algorithms were applied to one audio and three visual feature datasets. Feature level data fusion was applied to build a multimodal emotion classification system, which helped increase emotion classification accuracy by 4% compared to the best accuracy achieved by unimodal systems. The class precisions achieved by applying algorithms on unimodal and multimodal datasets helped to reveal that different data-classifier combinations are good at recognizing certain emotions. These data-classifier combinations were fused on the decision level using several approaches, which still helped increase the accuracy by 3% compared to the best accuracy achieved by feature level fusion.

References

  1. Haq, S., Jackson, P. J. B. (2009) Speaker-dependent audiovisual emotion recognition. In Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP'09), Norwich, UK, pp.53-58, September 2009.
  2. Rashid, M., Abu-Bakar, S. A. R., Mokji, M. (2012). Human emotion recognition from videos using spatio-temporal and audio features. Vis Comput (2013), 29: 1269-1275.
  3. Kahou, S. E., Pal, C., Bouthillier, X., Froumenty, P., Gulcehre, C., Memisevic, R., Vincent, P., Courville, A., Bengio, Y. (2013). Combining modality specific deep neural networks for emotion recognition in video. ICMI'13, December 9-13, 2013, Sydney, Australia.
  4. Cruz, A., Bhanu, B., Thakoor, N. (2012). Facial emotion recognition in continuous video. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), November 11-15, 2012, Tsukuba, Japan.
  5. Soleymani, M., Pantic, M., Pun, T. (2012). Multimodal emotion recognition in response to videos. IEEE Transactions on affective computing, vol. 3, no. 2, April-June, 2012.
  6. Eyben, F., Wullmer, M., Schuller, B. (2010). openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978-1-60558-933- 6, pp. 1459-1462, 25.-29.10.2010.
  7. Sariyanidi, E., Gunes, H., Gokmen, M., Cavallaro, A. (2013). Local Zernike Moment Representation for Facial Affect Recognition. BMVC'13.
  8. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S. (2004). Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. University of Southern California, Los Angeles, http://sail.usc.edu.
  9. Platt, J. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. TechReport MSR-TR-98-14, Microsoft Research.
Download


Paper Citation


in Harvard Style

Sidorov M., Sopov E., Ivanov I. and Minker W. (2015). Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem . In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 978-989-758-123-6, pages 246-251. DOI: 10.5220/0005527002460251


in Bibtex Style

@conference{icinco15,
author={Maxim Sidorov and Evgenii Sopov and Ilia Ivanov and Wolfgang Minker},
title={Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem},
booktitle={Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2015},
pages={246-251},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005527002460251},
isbn={978-989-758-123-6},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
SN - 978-989-758-123-6
AU - Sidorov M.
AU - Sopov E.
AU - Ivanov I.
AU - Minker W.
PY - 2015
SP - 246
EP - 251
DO - 10.5220/0005527002460251