Extracting Emotions and Communication Styles from Vocal Signals

Licia Sbattella; Luca Colombo; Carlo Rinaldi; Roberto Tedesco; Matteo Matteucci; Alessandro Trivilini

doi:10.5220/0004699301830195

Extracting Emotions and Communication Styles from Vocal Signals

Licia Sbattella, Luca Colombo, Carlo Rinaldi, Roberto Tedesco, Matteo Matteucci, Alessandro Trivilini

2014

Abstract

Many psychological and social studies highlighted the two distinct channels we use to exchange information among us—an explicit, linguistic channel, and an implicit, paralinguistic channel. The latter contains information about the emotional state of the speaker, providing clues about the implicit meaning of the message. In particular, the paralinguistic channel can improve applications requiring human-machine interactions (for example, Automatic Speech Recognition systems or Conversational Agents), as well as support the analysis of human-human interactions (think, for example, of clinic or forensic applications). In this work we present PrEmA, a tool able to recognize and classify both emotions and communication style of the speaker, relying on prosodic features. In particular, communication-style recognition is, to our knowledge, new, and could be used to infer interesting clues about the state of the interaction. We selected two sets of prosodic features, and trained two classifiers, based on the Linear Discriminant Analysis. The experiments we conducted, with Italian speakers, provided encouraging results (Ac=71% for classification of emotions, Ac=86% for classification of communication styles), showing that the models were able to discriminate among emotions and communication styles, associating phrases with the correct labels.

References

Anolli, L. (2002). Le emozioni. Ed. Unicopoli.
Anolli, L. and Ciceri, R. (1997). The voice of emotions. Milano, Angeli.
Asawa, K., Verma, V., and Agrawal, A. (2012). Recognition of vocal emotions from acoustic profile. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics.
Avesani, C., Cosi, P., Fauri, E., Gretter, R., Mana, N., Rocchi, S., Rossi, F., and Tesser, F. (2003). Definizione ed annotazione prosodica di un database di parlato-letto usando il formalismo ToBI. In Proc. of Il Parlato Italiano, Napoli, Italy.
Balconi, M. and Carrera, A. (2005). Il lessico emotivo nel decoding delle espressioni facciali. ESE - Psychofenia - Salento University Publishing.
Banse, R. and Sherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology.
Boersma, P. (1993). Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound. Institute of Phonetic Sciences, University of Amsterdam, Proceedings, 17:97- 110.
Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9/10):341-345.
Boersma, P. and Weenink, D. (2013). Manual of praat: doing phonetics by computer [computer program].
Bonvino, E. (2000). Le strutture del linguaggio: unintroduzione alla fonologia. Milano: La Nuova Italia.
Borchert, M. and Diisterhoft, A. (2005). Emotions in speech - experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. Natural Language Processing and Knowledge Engineering, IEEE.
Caldognetto, E. M. and Poggi, I. (2004). Il parlato emotivo. aspetti cognitivi, linguistici e fonetici. In Il parlato italiano. Atti del Convegno Nazionale, Napoli 13-15 febbraio 2003.
Canepari, L. (1985). LIntonazione Linguistica e paralinguistica. Liguori Editore.
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., and Fellenz, W. (2001). Emotion recognition in human-computer interaction. Signal Processing Magazine, IEEE.
D'Anna, L. and Petrillo, M. (2001). Apa: un prototipo di sistema automatico per lanalisi prosodica. In Atti delle 11e giornate di studio del Gruppo di Fonetica Sperimentale.
Delmonte, R. (2000). Speech communication. In Speech Communication.
Ekman, D., Ekman, P., and Davidson, R. (1994). The Nature of Emotion: Fundamental Questions. New York Oxford, Oxford University Press.
Gobl, C. and Chasaide, A. N. (2000). Testing affective correlates of voice quality through analysis and resynthesis. In ISCA Workshop on Emotion and Speech.
Hammarberg, B., Fritzell, B., Gauffin, J., Sundberg, J., and Wedin, L. (1980). Perceptual and acoustic correlates of voice qualities. Acta Oto-laryngologica, 90(1- 6):441-451.
Hastie, H. W., Poesio, M., and Isard, S. (2001). Automatically predicting dialog structure using prosodic features. In Speech Communication.
Hirshberg, J. and Avesani, C. (2000). Prosodic disambiguation in English and Italian, in Botinis. Ed., Intonation, Kluwer.
Hirst, D. (2001). Automatic analysis of prosody for multilingual speech corpora. In Improvements in Speech Synthesis.
Izard, C. E. (1971). The face of emotion. Ed. Appleton Century Crofts.
Juslin, P. (1998). A functionalist perspective on emotional communication in music performance. Acta Universitatis Upsaliensis, 1st edition.
Juslin, P. N. (1997). Emotional communication in music performance: A functionalist perspective and some data. In Music Perception.
Koolagudi, S. G., Kumar, N., and Rao, K. S. (2011). Speech emotion recognition using segmental level prosodic analysis. Devices and Communications (ICDeCom), IEEE.
Lee, C. M. and Narayanan, S. (2005). Toward detecting emotions in spoken dialogs. Transaction on Speech and Audio Processing, IEEE.
Leung, C., Lee, T., Ma, B., and Li, H. (2010). Prosodic attribute model for spoken language identification. In Acoustics, speech and signal processing. IEEE international conference (ICASSP 2010).
López-de Ipin˜a, K., Alonso, J.-B., Travieso, C. M., Solé- Casals, J., Egiraun, H., Faundez-Zanuy, M., Ezeiza, A., Barroso, N., Ecay-Torres, M., Martinez-Lage, P., and Lizardui, U. M. d. (2013). On the selection of non-invasive methods based on speech analysis oriented to automatic alzheimer disease diagnosis. Sensors, 13(5):6730-6745.
Mandler, G. (1984). Mind and Body: Psychology of Emotion and Stress. New York: Norton.
McGilloway, S., Cowie, R., Cowie, E. D., Gielen, S., Westerdijk, M., and Stroeve, S. (2000). Approaching automatic recognition of emotion from voice: a rough benchmark. In ISCA Workshop on Speech and Emotion.
McLachlan, G. J. (2004). Discriminant Analysis and Statistical Pattern Recognition. Wiley.
Mehrabian, A. (1972). Nonverbal communication. AldineAtherton.
Michel, F. (2008). Assert Yourself. Centre for Clinical Interventions, Perth, Western Australia.
Moridis, C. N. and Economides, A. A. (2012). Affective learning: Empathetic agents with emotional facial and tone of voice expressions. IEEE Transactions on Affective Computing, 99(PrePrints).
Murray, E. and Arnott, J. L. (1995). Towards a simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. Journal of the Acoustical Society of America.
Pinker, S. and Prince, A. (1994). Regular and irregular morphology and the psychological status of rules of grammar. In The reality of linguistic rules.
Planet, S. and Iriondo, I. (2012). Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In Information Systems and Technologies (CISTI).
Pleva, M., Ondas, S., Juhar, J., Cizmar, A., Papaj, J., and Dobos, L. (2011). Speech and mobile technologies for cognitive communication and information systems. In Cognitive Infocommunications (CogInfoCom), 2011 2nd International Conference on, pages 1 -5.
Purandare, A. and Litman, D. (2006). Humor: Prosody analysis and automatic recognition for F * R * I * E * N * D * S *. In Proc. of the Conference on Empirical Methods in Natural Language Processing, Sydney, Australia.
Russell, J. A. and Snodgrass, J. (1987). Emotion and the environment. Handbook of Environmental Psychology.
Sbattella, L. (2006). La Mente Orchestra. Elaborazione della risonanza e autismo. Vita e pensiero.
Sbattella, L. (2013). Ti penso, dunque suono. Costrutti cognitivi e relazionali del comportamento musicale: un modello di ricerca-azione. Vita e pensiero.
Scherer, K. (2005). What are emotions? and how can they be measured? Social Science Information.
Shi, Y. and Song, W. (2010). Speech emotion recognition based on data mining technology. In Sixth International Conference on Natural Computation.
Shriberg, E. and Stolcke, A. (2001). Prosody modeling for automatic speech recognition and understanding. In Proc. of ISCA Workshop on Prosody in Speech Recognition and Understanding.
Shriberg, E., Stolcke, A., Hakkani-Tr, D., and Tr, G. (2000). Prosody-based automatic segmentation of speech into sentences and topics. Ed. Speech Communication.
Stern, D. (1985). Il mondo interpersonale del bambino. Bollati Boringhieri, 1st edition.
Tesser, F., Cosi, P., Orioli, C., and Tisato, G. (2004). Modelli prosodici emotivi per la sintesi dell'italiano. ITCIRST, ISTC-CNR.
Tomkins, S. (1982). Affect theory. Approaches to emotion, Ed. Lawrence Erlbaum Associates.
Wang, C. and Li, Y. (2012). A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition. In Parallel Architectures, algortihms and programming. International symposium (PAAP 2012).

Download

Paper Citation

in Harvard Style

Sbattella L., Colombo L., Rinaldi C., Tedesco R., Matteucci M. and Trivilini A. (2014). Extracting Emotions and Communication Styles from Vocal Signals . In Proceedings of the International Conference on Physiological Computing Systems - Volume 1: PhyCS, ISBN 978-989-758-006-2, pages 183-195. DOI: 10.5220/0004699301830195

in Bibtex Style

@conference{phycs14,
author={Licia Sbattella and Luca Colombo and Carlo Rinaldi and Roberto Tedesco and Matteo Matteucci and Alessandro Trivilini},
title={Extracting Emotions and Communication Styles from Vocal Signals},
booktitle={Proceedings of the International Conference on Physiological Computing Systems - Volume 1: PhyCS,},
year={2014},
pages={183-195},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0004699301830195},
isbn={978-989-758-006-2},
}

in EndNote Style

TY - CONF
JO - Proceedings of the International Conference on Physiological Computing Systems - Volume 1: PhyCS,
TI - Extracting Emotions and Communication Styles from Vocal Signals
SN - 978-989-758-006-2
AU - Sbattella L.
AU - Colombo L.
AU - Rinaldi C.
AU - Tedesco R.
AU - Matteucci M.
AU - Trivilini A.
PY - 2014
SP - 183
EP - 195
DO - 10.5220/0004699301830195