S. S. (2008). Iemocap: Interactive emotional dyadic
motion capture database. Language resources and
evaluation, 42(4):335.
Costantini, G., Iadarola, I., Paoloni, and Todisco, M.
(2014). Emovo corpus: an italian emotional speech
database.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis,
G., Kollias, S., Fellenz, W., and Taylor, J. G. (2001).
Emotion recognition in human-computer interaction.
IEEE Signal processing magazine, 18(1):32–80.
El Ayadi, M., Kamel, M. S., and Karray, F. (2011). Sur-
vey on speech emotion recognition: Features, classi-
fication schemes, and databases. Pattern Recognition,
44(3):572–587.
Giannakopoulos, T. and Pikrakis, A. (2014). Introduction
to audio analysis: a MATLAB
R
approach. Academic
Press.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8):1735–1780.
Jackson, P. and ul haq, S. (2011). Surrey audio-visual ex-
pressed emotion (savee) database.
Jin, Q., Li, C., Chen, S., and Wu, H. (2015). Speech emo-
tion recognition with acoustic and lexical features.
In 2015 IEEE international conference on acoustics,
speech and signal processing (ICASSP), pages 4749–
4753. IEEE.
Lim, W., Jang, D., and Lee, T. (2016). Speech emotion
recognition using convolutional and recurrent neural
networks. In 2016 Asia-Pacific Signal and Informa-
tion Processing Association Annual Summit and Con-
ference (APSIPA), pages 1–4. IEEE.
Lowe, D. G. (2004). Distinctive image features from scale-
invariant keypoints. International journal of computer
vision, 60(2):91–110.
Mao, Q., Dong, M., Huang, Z., and Zhan, Y. (2014). Learn-
ing salient features for speech emotion recognition us-
ing convolutional neural networks. IEEE transactions
on multimedia, 16(8):2203–2213.
Mehrabian, A. (1995). Framework for a comprehensive de-
scription and measurement of emotional states. Ge-
netic, social, and general psychology monographs.
Nogueiras, A., Moreno, A., Bonafonte, A., and Mari
˜
no,
J. B. (2001). Speech emotion recognition using hid-
den markov models. In Seventh European Conference
on Speech Communication and Technology.
Plutchik, R. (1980). A general psychoevolutionary theory of
emotion. In Theories of emotion, pages 3–33. Elsevier.
Poria, S., Chaturvedi, I., Cambria, E., and Hussain, A.
(2016). Convolutional mkl based multimodal emo-
tion recognition and sentiment analysis. In 2016
IEEE 16th international conference on data mining
(ICDM), pages 439–448. IEEE.
Rozgi
´
c, V., Ananthakrishnan, S., Saleem, S., Kumar, R.,
Vembu, A. N., and Prasad, R. (2012). Emotion recog-
nition using acoustic and lexical features. In Thir-
teenth Annual Conference of the International Speech
Communication Association.
Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. R.
(2011). Orb: An efficient alternative to sift or surf. In
ICCV, volume 11, page 2. Citeseer.
Sak, H., Senior, A., and Beaufays, F. (2014). Long short-
term memory recurrent neural network architectures
for large scale acoustic modeling. In Fifteenth annual
conference of the international speech communication
association.
Sculley, D. (2010). Web-scale k-means clustering. In
Proceedings of the 19th international conference on
World wide web, pages 1177–1178. ACM.
Spyrou, E., Nikopoulou, R., Vernikos, I., and Mylonas, P.
(2019). Emotion recognition from speech using the
bag-of-visual words on audio segment spectrograms.
Technologies, 7(1):20.
Theodoridis, S. and Koutroumbas, K. D. (1999). Pattern
recognition. IEEE Trans. Neural Networks, 19:376.
Trentin, E., Scherer, S., and Schwenker, F. (2015). Emo-
tion recognition from speech signals via a probabilis-
tic echo-state network. Pattern Recognition Letters,
66:4–12.
Wang, Y. and Guan, L. (2008). Recognizing human emo-
tional state from audiovisual signals. IEEE transac-
tions on multimedia, 10(5):936–946.
W
¨
ollmer, M., Metallinou, A., Eyben, F., Schuller, B., and
Narayanan, S. (2010). Context-sensitive multimodal
emotion recognition from speech and facial expres-
sion using bidirectional lstm modeling. In Proc. IN-
TERSPEECH 2010, Makuhari, Japan, pages 2362–
2365.
Zeng, E., Mare, S., and Roesner, F. (2017). End user se-
curity and privacy concerns with smart homes. In
Thirteenth Symposium on Usable Privacy and Secu-
rity ({SOUPS} 2017), pages 65–80.
Sentiment Analysis from Sound Spectrograms via Soft BoVW and Temporal Structure Modelling
369