Al Aghbari, Z. (2020). Simultaneous prediction of
valence/arousal and emotions on affectnet, aff-wild and
afew-va. Procedia Computer Science, 170:634–641.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural Comput., 9(8):1735–1780.
Huang, R., Pedoeem, J., and Chen, C. (2018). Yolo-lite:
a real-time object detection algorithm optimized for
non-gpu computers. In 2018 IEEE International Con-
ference on Big Data (Big Data), pages 2503–2510.
IEEE.
Kim, C., Li, F., and Rehg, J. M. (2018). Multi-object tracking
with neural gating using bilinear lstm. In The European
Conference on Computer Vision (ECCV).
Kollias, D., Schulc, A., Hajiyev, E., and Zafeiriou, S. (2020).
Analysing affective behavior in the first abaw 2020
competition. arXiv preprint arXiv:2001.11409.
Kollias, D., Tzirakis, P., Nicolaou, M. A., Papaioannou,
A., Zhao, G., Schuller, B., Kotsia, I., and Zafeiriou,
S. (2019). Deep affect prediction in-the-wild: Aff-
wild database and challenge, deep architectures, and
beyond. International Journal of Computer Vision,
127(6-7):907–929.
Kollias, D. and Zafeiriou, S. (2019). Expression, affect,
action unit recognition: Aff-wild2, multi-task learning
and arcface. arXiv preprint arXiv:1910.04855.
Kossaifi, J., Tzimiropoulos, G., Todorovic, S., and Pantic,
M. (2017). Afew-va database for valence and arousal
estimation in-the-wild. Image and Vision Computing,
65:23–36.
Kossaifi, J., Walecki, R., Panagakis, Y., Shen, J., Schmitt, M.,
Ringeval, F., Han, J., Pandit, V., Schuller, B., Star, K.,
et al. (2019). Sewa db: A rich database for audio-visual
emotion and sentiment research in the wild. arXiv
preprint arXiv:1901.02839.
Li, C., Bao, Z., Li, L., and Zhao, Z. (2020). Explor-
ing temporal representations by leveraging attention-
based bidirectional lstm-rnns for multi-modal emotion
recognition. Information Processing & Management,
57(3):102185.
Liu, C., Conn, K., Sarkar, N., and Stone, W. (2008). Online
affect detection and robot behavior adaptation for inter-
vention of children with autism. IEEE T Robot, 24:883
– 896.
Luong, M.-T., Pham, H., and Manning, C. D. (2015). Ef-
fective approaches to attention-based neural machine
translation. arXiv preprint arXiv:1508.04025.
Lv, J.-J., Shao, X., Xing, J., Cheng, C., and Zhou, X.
(2017). A deep regression architecture with two-stage
re-initialization for high performance facial landmark
detection. 2017 IEEE CVPR, pages 3691–3700.
Ma, J., Tang, H., Zheng, W.-L., and Lu, B.-L. (2019). Emo-
tion recognition using multimodal residual lstm net-
work. In Proceedings of the 27th ACM International
Conference on Multimedia, pages 176–183.
McKeown, G., Valstar, M. F., Cowie, R., and Pantic, M.
(2010). The semaine corpus of emotionally coloured
character interactions. In 2010 IEEE Int Con Multi,
pages 1079–1084. IEEE.
Mitenkova, A., Kossaifi, J., Panagakis, Y., and Pantic, M.
(2019). Valence and arousal estimation in-the-wild
with tensor methods. In 2019 14th IEEE FG 2019,
pages 1–7. IEEE.
Mollahosseini, A., Hasani, B., and Mahoor, M. H. (2015).
Affectnet: A database for facial expression. Valence,
and Arousal Computing in the Wild Department of
Electrical and Computer Engineering, University of
Denver, Denver, CO, 80210.
Nicolaou, M. A., Gunes, H., and Pantic, M. (2011). Con-
tinuous prediction of spontaneous affect from multiple
cues and modalities in valence-arousal space. IEEE T
Affect Comput, 2(2):92–105.
Povolny, F., Matejka, P., Hradis, M., Popkov
´
a, A., Otrusina,
L., Smrz, P., Wood, I., Robin, C., and Lamel, L. (2016).
Multimodal emotion recognition for avec 2016 chal-
lenge. In Proceedings of the 6th International Work-
shop on Audio/Visual Emotion Challenge, AVEC ’16,
page 75–82, New York, NY, USA. Association for
Computing Machinery.
Ringeval, F., Sonderegger, A., Sauer, J., and Lalanne, D.
(2013). Introducing the recola multimodal corpus of
remote collaborative and affective interactions. In 2013
10th IEEE FG, pages 1–8.
Russell, J. A. (1980). A circumplex model of affect. Journal
of personality and social psychology, 39(6):1161.
Schmitt, M., Cummins, N., and Schuller, B. (2019). Con-
tinuous emotion recognition in speech–do we need
recurrence? Training, 34(93):12.
Tellamekala, M. K. and Valstar, M. (2019). Temporally
coherent visual representations for dimensional affect
recognition. In 2019 8th International Conference on
Affective Computing and Intelligent Interaction (ACII),
pages 1–7. IEEE.
Triantafyllidou, D. and Tefas, A. (2016). Face detection
based on deep convolutional neural networks exploit-
ing incremental facial part learning. In 2016 23rd In-
ternational Conference on Pattern Recognition (ICPR),
pages 3560–3565.
Xia, Y., Braun, S., Reddy, C. K. A., Dubey, H., Cutler,
R., and Tashev, I. (2020). Weighted speech distortion
losses for neural-network-based real-time speech en-
hancement. In ICASSP 2020 - 2020 IEEE ICASSP,
pages 871–875.
Xiaohua, W., Muzi, P., Lijuan, P., Min, H., Chunhua, J.,
and Fuji, R. (2019). Two-level attention with two-
stage multi-task learning for facial emotion recogni-
tion. Journal of Visual Communication and Image
Representation, 62:217–225.
Xie, J., Girshick, R. B., and Farhadi, A. (2016). Deep3d:
Fully automatic 2d-to-3d video conversion with deep
convolutional neural networks. In ECCV 2016, pages
842–857.
Ye, H., Li, G. Y., Juang, B.-H. F., and Sivanesan, K. (2018).
Channel agnostic end-to-end learning based commu-
nication systems with conditional gan. In 2018 IEEE
Globecom Workshops (GC Wkshps), pages 1–5. IEEE.
Zafeiriou, S., Kollias, D., Nicolaou, M. A., Papaioannou, A.,
Zhao, G., and Kotsia, I. (2017). Aff-wild: Valence and
arousal ‘in-the-wild’challenge. In IEEE CVPRW, 2017,
pages 1980–1987. IEEE.
An Enhanced Adversarial Network with Combined Latent Features for Spatio-temporal Facial Affect Estimation in the Wild
181