Kane, J. and Gobl, C. (2013). Wavelet maxima disper-
sion for breathy to tense voice discrimination. Audio,
Speech, and Language Processing, IEEE Transactions
on, 21(6):1170–1179.
Lee, C. M., Yildirim, S., Bulut, M., Kazemzadeh, A.,
Busso, C., Deng, Z., Lee, S., and Narayanan, S. S.
(2004). Emotion recognition based on phoneme
classes. In Proceedings of ICSLP 2004.
Luengo, I., Navas, E., and Hern´aez, I. (2010). Feature anal-
ysis and evaluation for automatic emotion identifica-
tion in speech. Multimedia, IEEE Transactions on,
12(6):490–501.
Lugger, M. and Yang, B. (2006). Classification of different
speaking groups by means of voice quality parame-
ters. ITG-Fachbericht-Sprachkommunikation 2006.
Lugger, M. and Yang, B. (2007). The relevance of
voice quality features in speaker independent emotion
recognition. In Acoustics, Speech and Signal Process-
ing, 2007. ICASSP 2007. IEEE International Confer-
ence on, volume 4, pages IV–17. IEEE.
Meng, H., Huang, D., Wang, H., Yang, H., AI-Shuraifi, M.,
and Wang, Y. (2013). Depression recognition based
on dynamic facial and vocal expression features us-
ing partial least square regression. In Proceedings of
AVEC 2013, AVEC ’13, pages 21–30. ACM.
Meudt, S., Zharkov, D., K¨achele, M., and Schwenker, F.
(2013). Multi classifier systems and forward back-
ward feature selection algorithms to classify emo-
tional coloured speech. In Proceedings of the Inter-
national Conference on Multimodal Interaction (ICMI
2013).
Nwe, T. L., Foo, S. W., and De Silva, L. C. (2003). Speech
emotion recognition using hidden markov models.
Speech communication, 41(4):603–623.
Ojala, T., Pietikinen, M., and Harwood, D. (1996). A com-
parative study of texture measures with classification
based on featured distributions. Pattern Recognition,
29(1):51 – 59.
Ojansivu, V. and Heikkil, J. (2008). Blur insensitive tex-
ture classification using local phase quantization. In
Elmoataz, A., Lezoray, O., Nouboud, F., and Mam-
mass, D., editors, Image and Signal Processing, vol-
ume 5099 of LNCS, pages 236–243. Springer Berlin
Heidelberg.
Palm, G. and Schwenker, F. (2009). Sensor-fusion in neu-
ral networks. In Shahbazian, E., Rogova, G., and
DeWeert, M. J., editors, Harbour Protection Through
Data Fusion Technologies, pages 299–306. Springer.
Russell, J. A. and Mehrabian, A. (1977). Evidence for a
three-factor theory of emotions. Journal of Research
in Personality, 11(3):273 – 294.
S´anchez-Lozano, E., Lopez-Otero, P., Docio-Fernandez, L.,
Argones-R´ua, E., and Alba-Castro, J. L. (2013). Au-
diovisual three-level fusion for continuous estimation
of russell’s emotion circumplex. In Proceedings of
AVEC 2013, AVEC ’13, pages 31–40. ACM.
Saragih, J. M., Lucey, S., and Cohn, J. F. (2011). De-
formable model fitting by regularized landmark mean-
shift. Int. J. Comput. Vision, 91(2):200–215.
Scherer, K. R., Johnstone, T., and Klasmeyer, G. (2003).
Handbook of Affective Sciences - Vocal expression of
emotion, chapter 23, pages 433–456. Affective Sci-
ence. Oxford University Press.
Scherer, S., Kane, J., Gobl, C., and Schwenker, F. (2012).
Investigating fuzzy-input fuzzy-output support vector
machines for robust voice quality classification. Com-
puter Speech and Language, 27(1):263–287.
Scherer, S., Schwenker, F., and Palm, G. (2008). Emotion
recognition from speech using multi-classifier sys-
tems and rbf-ensembles. In Speech, Audio, Image and
Biomedical Signal Processing using Neural Networks,
pages 49–70. Springer Berlin Heidelberg.
Schwenker, F., Scherer, S., Schmidt, M., Schels, M., and
Glodek, M. (2010). Multiple classifier systems for the
recognition of human emotions. In Gayar, N. E., Kit-
tler, J., and Roli, F., editors, Proceedings of the 9th In-
ternational Workshop on Multiple Classifier Systems
(MCS’10), LNCS 5997, pages 315–324. Springer.
Senechal, T., Rapp, V., Salam, H., Seguier, R., Bailly, K.,
and Prevost, L. (2012). Facial action recognition com-
bining heterogeneous features via multikernel learn-
ing. Systems, Man, and Cybernetics, Part B: Cyber-
netics, IEEE Transactions on, 42(4):993–1005.
Valstar, M., Schuller, B., Smith, K., Eyben, F., Jiang, B.,
Bilakhia, S., Schnieder, S., Cowie, R., and Pantic,
M. (2013). Avec 2013: The continuous audio/visual
emotion and depression recognition challenge. In
Proceedings of AVEC 2013, AVEC ’13, pages 3–10.
ACM.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In Computer Vi-
sion and Pattern Recognition, 2001. CVPR 2001. Pro-
ceedings of the 2001 IEEE Computer Society Confer-
ence on, volume 1, pages I–511–I–518 vol.1.
W¨ollmer, M., Kaiser, M., Eyben, F., Schuller, B., and
Rigoll, G. (2013). LSTM-modeling of continuous
emotions in an audiovisual affect recognition frame-
work. Image and Vision Computing, 31(2):153 – 163.
Affect Analysis In Continuous Input.
Yang, S. and Bhanu, B. (2011). Facial expression recogni-
tion using emotion avatar image. In Automatic Face
Gesture Recognition and Workshops (FG 2011), 2011
IEEE International Conference on, pages 866–871.
Zeng, Z., Pantic, M., Roisman, G. I., and Huang, T. S.
(2009). A survey of affect recognition methods: Au-
dio, visual, and spontaneous expressions. pages 39–
58.
ICPRAM2014-InternationalConferenceonPatternRecognitionApplicationsandMethods
678