Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
Maxim Sidorov, Evgenii Sopov, Ilia Ivanov, Wolfgang Minker
2015
Abstract
The speech-based emotion recognition problem has already been investigated by many authors, and reasonable results have been achieved. This article focuses on applying audio-visual data fusion approach to emotion recognition. Two state-of-the-art classification algorithms were applied to one audio and three visual feature datasets. Feature level data fusion was applied to build a multimodal emotion classification system, which helped increase emotion classification accuracy by 4% compared to the best accuracy achieved by unimodal systems. The class precisions achieved by applying algorithms on unimodal and multimodal datasets helped to reveal that different data-classifier combinations are good at recognizing certain emotions. These data-classifier combinations were fused on the decision level using several approaches, which still helped increase the accuracy by 3% compared to the best accuracy achieved by feature level fusion.
References
- Haq, S., Jackson, P. J. B. (2009) Speaker-dependent audiovisual emotion recognition. In Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP'09), Norwich, UK, pp.53-58, September 2009.
- Rashid, M., Abu-Bakar, S. A. R., Mokji, M. (2012). Human emotion recognition from videos using spatio-temporal and audio features. Vis Comput (2013), 29: 1269-1275.
- Kahou, S. E., Pal, C., Bouthillier, X., Froumenty, P., Gulcehre, C., Memisevic, R., Vincent, P., Courville, A., Bengio, Y. (2013). Combining modality specific deep neural networks for emotion recognition in video. ICMI'13, December 9-13, 2013, Sydney, Australia.
- Cruz, A., Bhanu, B., Thakoor, N. (2012). Facial emotion recognition in continuous video. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), November 11-15, 2012, Tsukuba, Japan.
- Soleymani, M., Pantic, M., Pun, T. (2012). Multimodal emotion recognition in response to videos. IEEE Transactions on affective computing, vol. 3, no. 2, April-June, 2012.
- Eyben, F., Wullmer, M., Schuller, B. (2010). openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor. In Proceedings ACM Multimedia (MM), ACM, Florence, Italy, ISBN 978-1-60558-933- 6, pp. 1459-1462, 25.-29.10.2010.
- Sariyanidi, E., Gunes, H., Gokmen, M., Cavallaro, A. (2013). Local Zernike Moment Representation for Facial Affect Recognition. BMVC'13.
- Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C. M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S. (2004). Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information. University of Southern California, Los Angeles, http://sail.usc.edu.
- Platt, J. (1998). Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. TechReport MSR-TR-98-14, Microsoft Research.
Paper Citation
in Harvard Style
Sidorov M., Sopov E., Ivanov I. and Minker W. (2015). Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem . In Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO, ISBN 978-989-758-123-6, pages 246-251. DOI: 10.5220/0005527002460251
in Bibtex Style
@conference{icinco15,
author={Maxim Sidorov and Evgenii Sopov and Ilia Ivanov and Wolfgang Minker},
title={Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem},
booktitle={Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,},
year={2015},
pages={246-251},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005527002460251},
isbn={978-989-758-123-6},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 12th International Conference on Informatics in Control, Automation and Robotics - Volume 2: ICINCO,
TI - Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
SN - 978-989-758-123-6
AU - Sidorov M.
AU - Sopov E.
AU - Ivanov I.
AU - Minker W.
PY - 2015
SP - 246
EP - 251
DO - 10.5220/0005527002460251