Seven Principles to Mine Flexible Behavior from Physiological Signals for

Effective Emotion Recognition and Description in Affective Interactions

Rui Henriques

and Ana Paiva

KDBIO, Inesc-ID, Instituto Superior T

ecnico, University of Lisbon, Lisbon, Portugal

GAIPS, Inesc-ID, Instituto Superior T

ecnico, University of Lisbon, Lisbon, Portugal

Keywords:

Mining Physiological Signals, Measuring Affective Interactions, Emotion Recognition and Description.

Abstract:

Measuring affective interactions using physiological signals has become a critical step to understand engage-

ments with human and artiﬁcial agents. However, traditional methods for signal analysis are not yet able

to effectively deal with the differences of responses across individuals and with ﬂexible sequential behav-

ior. In this work, we rely on empirical results to deﬁne seven principles for a robust mining of physiological

signals to recognize and characterize affective states. The majority of these principles are novel and driven

from advanced pre-processing techniques and temporal data mining methods. A methodology that integrates

these principles is proposed and validated using electrodermal signals collected during human-to-human and

human-to-robot affective interactions.

1 INTRODUCTION

Monitoring physiological signals is increasingly nec-

essary to derive accurate analysis from affective in-

teractions or to dynamically adapt these interactions.

Although many methods have been proposed for an

emotion-centered analysis of physiological signals

(Jerritta et al., 2011; Wagner et al., 2005), there is still

lacking an integrative view of existing contributions.

Additionally, existing methods suffer from three ma-

jor drawbacks. First, there is no agreement on how to

deal with individual differences and with spontaneous

variations of the signals. Second, they generally rely

on feature-driven models and, therefore, discard ﬂex-

ible sequential behavior of physiological responses.

Finally, experimental conditions and psychophysio-

logical data from users have not been adopted to shape

the classiﬁcation models.

In this paper, we propose seven principles to guide

the mining of physiological signals for an effective

emotion recognition and characterization. These prin-

ciples were derived from an experimental comparison

of advanced techniques from machine learning and

signal processing using physiological signals, such as

skin activity and temperature, collected during affec-

tive interactions. These principles can be used to ad-

dress the three introduced drawbacks. They provide

an integrated and up-to-date view on how to disclose

and describe affective states from physiological sig-

nals. A methodology that relies on these principles is,

additionally, proposed.

This paper is structured as follows. In Section 2,

relevant work on the mining of sensor-based data in

emotion-centered studies is covered. Section 3 de-

ﬁnes the seven principles and the target methodol-

ogy. Section 4 provides the supporting quantitative

evidence for the introduced principles using signals

collected under different experimental settings. Fi-

nally, the main implications are synthesized.

2 BACKGROUND

Physiological signals are increasingly adopted to

monitor and shape affective interactions since they are

hardly prone to masking and can track subtle but sig-

niﬁcant cognitive-sensitive emotional changes. How-

ever, their complex, variable and subjective expres-

sion within and among individuals pose key chal-

lenges for an adequate modeling of emotions.

Consider a set of annotated signals D=(x

,..,x

where each instance is a tuple x

=(~y,a

,..,a

,c) where

~y is the signal, a

is an annotation related with the sub-

ject or experimental setting, and c is the labeled emo-

tion or stimulus. Given D, the emotion recognition

task aims learn a model M to label a new unlabeled

instance (~y, a

,..,a

). Emotion description task aims

to learn a model M that characterizes the major prop-

erties of ~y signal for each emotion c.

Henriques R. and Paiva A..

Seven Principles to Mine Flexible Behavior from Physiological Signals for Effective Emotion Recognition and Description in Affective Interactions.

DOI: 10.5220/0004666400750082

In Proceedings of the International Conference on Physiological Computing Systems (PhyCS-2014), pages 75-82

ISBN: 978-989-758-006-2

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

The goal of emotion recognition and description

is to (dynamically) access someone’s feelings from

(streaming) signals. Emotion recognition from phys-

iological signals has been applied in the context of

human-robot interaction (Kulic and Croft, 2007; Leite

et al., 2013), human-computer interaction (Picard

et al., 2001), social interaction (Wagner et al., 2005),

sophisticated virtual adaptive scenarios (Rani et al.,

2006), among others (Jerritta et al., 2011). Multiple

physiological modalities have been adopted depend-

ing on the goal of the task. For instance, electroder-

mal activity has been used to identify engagement and

excitement states, respiratory volume and rate to rec-

ognize negative-valenced emotions, and heat contrac-

tile activity to separate positive-valenced emotions

(Wu et al., 2011). Additionally, the experimental set-

ting of existing studies also vary, namely the prop-

erties of the selected stimuli (discrete vs. continuous)

and general factors related with user dependency (sin-

gle vs. multiple subjects), subjectivity of the stimuli

(high-agreement vs. self-report) and the analysis time

of the signal (static vs. dynamic).

A ﬁrst drawback of existing emotion-centered

studies is the absence of learned principles to mine

the signals. Although multiple models are compared

using accuracy levels, there is no in-depth analysis of

the underlying behavior of these models and no guar-

antees regarding their statistical signiﬁcance. Addi-

tionally, there is no assessment on how their perfor-

mance varies for alternative experimental settings.

A second drawback is related with the fact

that these studies rely on simple pre-processing

techniques and feature-driven models. First, pre-

processing steps are centered on the removal of con-

taminations and on simplistic normalization proce-

dures. These techniques are insufﬁcient to deal with

differences on responses among subjects and with the

isolation of spontaneous variations of the signal.

Second, even in the presence of expressive fea-

tures, models are not able to effectively accommodate

ﬂexible sequential behavior. For instance, a rising or

recovering behavior may be described by speciﬁc mo-

tifs sensitive to sub-peaks or displaying a logarithmic

decaying. This weak-differentiation among responses

leads to rigid models of emotions.

The task of this work is to identify a set of consis-

tent principles to address these drawbacks, thus im-

proving emotion recognition rates.

3 SOLUTION

Relying on experimental evidence, seven principles

were deﬁned to surpass the limitations of traditional

models for emotion recognition from physiological

signals. The impact of adopting these principles were

validated over electrodermal activity, facial expres-

sion and skin temperature signals. Nevertheless, these

principles can be tested for any other physiological

signal after the neutralization of cyclic behavior (e.g.

respiratory and cardiac signals) and/or the application

of smoothing and low-pass ﬁlters.

3.1 The Seven Principles

#1: Adopt Representations able to Handle Individ-

ual Differences of Responses

Problem: The differences of physiological re-

sponses for a single emotion are often related with ex-

perimental conditions, such as the placement of sen-

sors or unregulated environment, and with speciﬁc

psychophysiological properties of the subjects, such

as lability and current mood. These undesirable dif-

ferences affect both the: i) amplitude axis (varying

baseline levels and peak-variations of responses), and

the ii) temporal axis (varying latency, rising and re-

covery time of responses).

On one hand, recognition rates degrade as a result

of an increased modeling complexity due to these dif-

ferences. On the other hand, when normalizing sig-

nals along the amplitude-time axes, we are discard-

ing absolute behavior that is often critical to distin-

guish emotions. Additionally, common normalization

procedures are not adequate since the signal baseline

and response amplitude may not be correlated (e.g.

high baseline does not mean heightened elicited re-

sponses).

Solution: A new representation of the signal that

minimizes individual differences should be adopted,

and combined with the original signal for the learning

of the target model.

While many representations for time series ex-

ist (Lin et al., 2003b), they either scale poorly as

the cardinality is not changed or require previous ac-

cess to all the signal preventing a dynamic analysis of

the signal. Symbolic ApproXimation (SAX) satisﬁes

these requirements and offers a lower-bounding guar-

antee. SAX behavior can be synthesized in two steps.

First, the signal is transformed into a Piecewise Ag-

gregate Approximated (PAA) representation. Second,

the PAA signal is symbolized into a discrete string. A

Gaussian distribution is used to produce symbols with

equiprobability from statistical breakpoints (Lin et al.,

2003a). Unlike other representations, the Gaussian

distribution for amplitude control smooths the prob-

lem of subjects with baseline levels and response vari-

ations not correlated.

Amplitude differences can be corrected with re-

PhyCS2014-InternationalConferenceonPhysiologicalComputingSystems

spect to all stimuli, to a target stimulus, to all subjects,

or to a speciﬁc subject. To treat temporal differences,

two strategies can be adopted. First, signals can be

used as-is (with their different numerosity) and given

as input to sequential learners, which are able to deal

with this aspect. Note, for instance, the robustness of

hidden Markov models on detecting hand-writing text

with different sizes in (Bishop, 2006). Second, the

use of piecewise aggregation analysis, such as pro-

vided by SAX, can be used to normalize numerosity

differences.

#2: Account for Relevant Signal Variations

Problem: Motifs and features sensitive to sub-

peaks are critical for emotion recognition (e.g. elec-

trodermal variations hold the potential to separate

anger from fear responses (Andreassi, 2007)). How-

ever, traditional methods rely on ﬁxed amplitude-

thresholds to detect informative signal variations,

which became easily corrupted due to the individual

subject differences. Additionally, when cardinality is

reduced, relevant sub-peaks disappear.

Solution: Two strategies can be adopted. First,

a representation to enhance local variations, referred

as local-angle. The signal is partitioned in thin time-

partitions and the angle associated with the signal

variation for each partition is computed and translated

into symbols based on break-points computed from

the input number of symbols. Similarly to SAX, the

angle break points are also deﬁned assuming a Gaus-

sian distribution. When adopting an 6-dim alphabet,

the following illustrative SAX-based univariate sig-

nal: <17,13,15,14,18,19,16,14,13,12,16,16>, would be

translated into the following local-angle representa-

tion: <0,4,1,5,5,0,1,1,1,5,4>.

Second, multiple SAX representations can be

adopted using different cardinalities. While mapping

the raw signals into low-cardinal signals is useful

to capture smoothed behavior (e.g. alphabet size

less than 8), a map into high-cardinal signals is able

to capture more delineated behavior (e.g. alphabet

size above 10). One model can be learned for

each representation, with the joint probability being

computed to label a response.

#3: Include Flexible Sequential Behavior

Problem: Although sequential learning is the nat-

ural option for audio-and-visual signals, the existing

models for emotion recognition mainly rely on ex-

tracted features. Feature-extraction methods are not

able to capture ﬂexible behavior (e.g. motifs under-

lying complex rising and decaying responses) and are

strongly dependent on directive thresholds (e.g. peak

amplitude to compute frequency measures).

Solution: Generative models learned from se-

quential data, such as recurrent neural networks or dy-

namic Bayesian networks, can be adopted to satisfy

this principle (Bishop, 2006). In particular, hidden

Markov models (HMMs) are an attractive option due

to their stability, simplicity and ﬂexible parameter-

control (Murphy, 2002). The core task is to learn the

generation and transition probabilities of a hidden au-

tomaton for each emotion. Given a non-labeled sig-

nal, we can assess the probability of being generated

by each learned model. An additional exploitation of

the lattices per emotion can be used to retrieve emerg-

ing patterns and, thus, be used as emotion descriptors.

The parameterization of HMMs must be based on

the signal properties (e.g. high dimensionality leads

to an increased number of hidden states). Alternative

architectures, such as fully-interconnected or left-to-

right architectures, can be considered.

From the conducted experiments, an analysis

of the learned emissions from the main path of

left-to-right HMM architectures revealed emerging

rising and recovering responses following sequential

patterns with ﬂexible displays (e.g. exponential and

”stairs”-appearance behavior).

#4: Integrate Sequential and Feature-driven Models

Problem: Since sequential learners capture the

overall behavior of physiological responses, they are

not able to highlight speciﬁc discriminative properties

of the signal. Often such discriminative properties are

adequately described by simple features.

Solution: Feature-driven and sequential models

should be integrated as they provide different but

complementary views. One option is to rely on a post-

voting stage. A second option is to use one model to

discriminate the less probable emotions, and to use

such constraints on the remaining model.

Feature-driven models have been widely re-

searched and are centered on three major steps:

feature extraction, feature selection and feature-based

learning (Lessard, 2006; Jerritta et al., 2011). Expres-

sive features include statistical, temporal, frequency

and, more interesting, temporal-frequency metrics

(from geometric analysis, multiscale sample entropy,

sub-band spectra). Feature extraction methods in-

clude tonic-phasic windows; moving-sliding features;

transformations (Fourier, wavelet, Hilbert); compo-

nent analysis; projection pursuit; auto-associative

nets; and self-organizing maps. Methods to remove

features without signiﬁcant correlation with the

emotion under assessment include sequential selec-

tion, branch-and-bound search, Fisher projection,

Davies-Bouldin index, analysis of variance and some

classiﬁers. Finally, a wide-variety of deterministic

and probabilistic learners have been adopted to per-

SevenPrinciplestoMineFlexibleBehaviorfromPhysiologicalSignalsforEffectiveEmotionRecognitionandDescription

inAffectiveInteractions

form emotion recognition based on relevant features.

The most successful learners are k-nearest neighbors,

regression trees, random forests, Bayesian networks,

support vector machines, canonical correlation and

linear discriminant analysis, neural networks, and

Marquardt-back propagation.

#5: Use subject’s Traits to Shape the Model

Problem: Subjects with different psychophysio-

logical proﬁles tend to have different physiological re-

sponses for the same stimuli. Modeling responses for

emotions without this prior knowledge hampers the

learning task since the models have to deﬁne multiple

paths or generalize responses in order to accommo-

date such alternative expressions of an emotion due

to proﬁle differences.

Solution: Turn the learning sensitive to psycho-

physiological traits of the subject under assessment

when available. We found that the inclusion of the

relative score for the four Myers-Briggs types

was

found to increase the accuracy of learning models.

For lazy learners, the simple inclusion of these

traits as features is sufﬁcient. We observed an in-

creased accuracy in k-nearest neighbors, which tends

to select responses from subjects with related proﬁle.

A simple strategy for non-lazy learners is to par-

tition data by traits, and to learn one model for each

trait. Emotion recognition is done by integrating the

results of the models with the proﬁle of the testing

subject. This integration can recur to a weighted vot-

ing scheme, where weights essentially depend on the

score obtained for each assessed trait.

A more robust strategy is to learn a tree structure

with classiﬁcation models in the leafs, where a

branching decision is associated with trait values that

are correlated with heightened response differences

for a speciﬁc emotion.

#6: Reﬁne the Learning Models based on the Com-

plexity of Emotion Expression

Problem: A single emotion-evocative stimulus

can elicit small-to-large groups of signiﬁcantly differ-

ent physiological responses. A simple generalization

of each set of responses leads to poor models.

Solution: Create multiple sub-models for emo-

tions with varying physiological expressions. Both

rule-based models, such as random forests, and lazy

learners implicitly accommodate this behavior.

Generative models need to be further reﬁned when

the emission probabilities of the underlying lattices

for a speciﬁc emotion do not have a strong conver-

gence. When HMMs are adopted, it is crucial to

http://www.myersbriggs.org/

change the architecture to add an alternative path with

a new hidden automaton.

For non-generative models, it is crucial to under-

stand when the model needs to be further reﬁned. This

can be done by analyzing the variances of features per

emotion or by clustering responses per emotion with

a non-ﬁxed number of clusters.

Not only these strategies can improve the emo-

tion recognition rates, but also the characterization of

physiological responses per emotion. Consider the

case where the learned HMMs are used as a pattern

descriptor. Without further separation of different ex-

pressions for each emotion, the generative models per

emotion would be more prone to error and only reveal

generic behavior.

#7: Affect the Models to the Conditions of the Ex-

perimental Setting

Problem: the properties of the emotion recogni-

tion task varies with different settings, such as dis-

crete vs. prolonged stimuli, user-dependent vs. inde-

pendent studies, univariate vs. multivariate signals.

Solution: The selection and parameterization of

classiﬁcation models should be guided by the experi-

mental conditions. Below we introduced three exam-

ples derived from our analysis. First, the inﬂuence of

sub-peak analysis (principle #2) for emotion recogni-

tion should have a higher weight for prolonged stim-

uli. Second, user-dependent studies are particularly

well described by ﬂexible sequential behavior (prin-

ciple #3). Third, multivariate analysis should be per-

formed in an integrated fashion whenever possible.

Common generative models, such as HMMs, are able

to model multivariate signals.

Additionally, we found that both the inclusion of

other experimental properties (such as interaction an-

notations) and of the perception of the subject re-

garding the interaction (assessed recurring to post-

surveys) can guide the learning of the target emotion

recognition models.

3.2 Methodology

Relying on the introduced seven principles, we pro-

pose a novel methodology for emotion recognition

and description from physiological signals

. Fig.1 il-

lustrates its main steps. Emotion recognition com-

bines the traditional feature-based classiﬁcation with

the results provided from sequence learners and is

centered on two expressive representations: i) SAX to

normalize individual differences while still preserv-

ing overall response pattern, and on ii) local angles

to enhance the local sub-peaks of a response. Addi-

Software in web.ist.utl.pt/rmch/research/software/eda

PhyCS2014-InternationalConferenceonPhysiologicalComputingSystems

Figure 1: Proposed methodology for emotion recognition

and description from physiological signals.

tionally, emotion characterization is accomplished us-

ing both feature-based descriptors (mean and variance

of the most discriminative features) and the transition

lattices generated by sequence learners.

In the presence of background knowledge, that is,

when each instance (~y, a

,..,a

,c) has n ≥ 1, prior de-

cisions can be made. Exemplifying, in the presence

of psychophysiological traits correlated with varying

expression of a speciﬁc emotion, the target model can

be further decomposed to reduce the complexity of

the task. Complementary, iterative reﬁnements over

the learned model can be made when feature-based

models rely on features with high variances or when

the generative models do not have strong convergence

criteria for a speciﬁc emotion.

4 RESULTS

The proposed principles and methodology resulted

from an evaluation of advanced data mining and sig-

nal processing concepts using a tightly-controlled lab

study

. More than 200 signals were collected for each

physiological modality from both human-to-human

and human-to-robot affective interactions

. Electro-

dermal activity (EDA), skin temperature, and facial

expression modalities were monitored using Affectiva

technology. Although the conveyed results are cen-

tered on electrodermal activity and temperature, pre-

vious work from the institute on the use of facial ex-

pression to recognize emotion during affective games

adds supporting evidence to the relevance of the listed

principles (Leite et al., 2013).

details, data, scripts and statistical sheets available in

http://web.ist.utl.pt/rmch/research/software/eda

30 participants, with ages between 19 and 24 (aver-

age of 21 years old), were randomly divided in two groups,

R and H. Subjects from group R interacted with the NAO

robot (www.aldebaran-robotics.com) using a wizard-of-Oz

setting. Participants from group H interacted with an human

agent, an actor with a structured and ﬂexible script.

Eight different stimuli, 5 emotion-centered stim-

uli

and 3 others (captured during periods of strong

physical effort, concentration and resting), were pre-

sented to each subject

. A survey was used to cate-

gorize the proﬁle of the participants according to the

Myers-Briggs type indicator.

Statistical and geometric features were extracted

from the raw, SAX and local-angle representations.

Feature selection was performed using statistical

analysis of variance (ANOVA). The selected feature-

based classiﬁers were adopted from WEKA software

(Hall et al., 2009), and the HMMs from HMM-

WEKA extension (codiﬁed according to Bishop

(2006)). SAX and local angle representations were

implemented using Java (JVM version 1.6.0-24) and

the following results were computed using an Intel

Core i5 2.80GHz with 6GB of RAM.

Principles #1 and #2. To assess the impact of

dealing with individual differences and informative

subtle variations of the signals, we evaluate emotion

recognition scores under SAX and local-angle repre-

sentations using feature-driven models. The score is

accuracy, the ability to correctly label an unlabeled

signal (i.e. to identify the underline emotion from 5

emotions). Accuracy was computed using a 10 cross-

fold validation over the ∼200 collected electrodermal

signals. Fig.2 synthesizes the results.

The isolated use of electrodermal features from

the raw signal (tonic and phasic skin conductivity,

maximum amplitude, rising and recovering time) and

of statistical features extracted from SAX and local-

angle representations leads to an accuracy near 50%

(against 20% when using a random model). The inte-

gration of these features results in an improvement of

10pp to near 60%. Additionally, accuracy improves

when features from skin temperature are included.

Logistic learners, which use regressions on the

real-valued features to affect the probability score of

each emotion, were the best feature-based models for

this experiment. When no feature selection method

is applied, Bayesian nets are an attractive alternative.

Despite the differences between human-to-human and

human-to-robot settings, classiﬁers are still able to

recognize emotions when mixing the cases. For in-

Empathy (following common practices in speech tone

and body approach), expectation (possibility of gaining an

additional reward), positive-surprise (unexpected attribu-

tion of a signiﬁcant incremental reward), stress (impossible

riddle to solve in a short time to maintain the incremental

reward) and frustration (self-responsible loss of the initial

and incremental rewards).

The stimuli were presented in the same order in ev-

ery experience and 6-8 minutes was provided between two

stimulus to neutralize the subject emotional state and re-

move the stress related with the experimental expectations.

SevenPrinciplestoMineFlexibleBehaviorfromPhysiologicalSignalsforEffectiveEmotionRecognitionandDescription

inAffectiveInteractions

Figure 2: Emotion recognition accuracy (out of 5 emotions)

using feature-driven models.

stance, kNN tends to select the features from a sole

scenario when k<4, while C4.5 trees have dedicated

branches for each scenario.

Note, additionally, that these accuracy levels also

reveal the adequacy of emotion description models,

which can simply rely on centroid and dispersion met-

rics over the most discriminative features.

Additionally, to understand the relevance of fea-

tures extracted from SAX and local-angle representa-

tions to differentiate emotions under assessment, one-

way ANOVA tests were applied with the Tukey post-

hoc analysis. A signiﬁcance of 5% was considered for

the Levene’s test of variance homogeneity, ANOVA

and Tukey tests. Both features derived from the raw,

SAX and local-angle electrodermal signals were con-

sidered. A representative set of electrodermal features

able to separate emotions is synthesized in Table 1.

Gradient plus centroid metrics from SAX signals

can be adopted to separate negative emotions. Disper-

sion metrics from local-angle representations differ-

entiate positive emotions. Rise time and response am-

plitude can be used to isolate speciﬁc emotions, and

statistical features, such as median and distortion, to

predict the affective valence. Kurtosis, which reveals

the ﬂatness of the response’s major peak, and features

derived from the temperature signal were also able to

differentiate emotions with signiﬁcance using the pro-

posed representations.

Principles #3 and #4. In our experimental set-

ting, the inclusion of sequential behavior leads to an

increase of accuracy levels nearly 10pp. The output

of HMMs were, additionally, combined the output of

probabilistic feature-based classiﬁers (logistic learn-

ers were the choice). Table 2 discloses the results

Table 1: Features with potential to discriminate emotions.

Features (with strongest statistical signiﬁcance

to differentiate emotions’ sets)

Separated emotions

Accentuated dispersion metrics (as the mean

root square error) from the SAX and local-angle

representations

Positive (empa-

thy, expectation,

surprise)

Median (relevant to quantify the sustenance of

peaks), distortion and recovery time from SAX

signals

Positive from neg-

ative from neutral

emotions

Gradient (revealing long-term sympathetic acti-

vation by measuring the EDA baseline changed)

and centroid metrics from SAX signals

Fear from frustra-

tion

Rise time

Empathy from oth-

ers

Response amplitude Surprise from others

Table 2: Accuracy of sequence learners to recognize an

emotion (out of 5 emotions) and to correctly discard the 3

least probable emotions.

SAX signal

Inc. local-angle

Inc. temperature

Inc. features

HMM (fully

connected

architecture)

Recognition

accuracy

All 0.40 0.42 0.46 0.67

Robot 0.39 0.41 0.44 0.66

Human 0.39 0.42 0.45 0.67

Discrimination

accuracy

All 0.86 0.88 0.89 –

Robot 0.87 0.88 0.91 –

Human 0.86 0.88 0.90 –

HMM

(left-to-right

architecture)

(Murphy,

2002)

Recognition

accuracy

All 0.43 0.44 0.48 0.71

Robot 0.42 0.43 0.47 0.71

Human 0.41 0.44 0.47 0.69

Discrimination

accuracy

All 0.87 0.88 0.90 –

Robot 0.87 0.89 0.90 –

Human 0.87 0.88 0.89 –

when adopting HMMs with alternative architectures

for approximately 30 signals per emotion (empathy,

expectation, surprise, stress, frustration).

Interestingly, the learned HMMs are highly prone

to accurately neglect 3 emotion labels that do not

ﬁt in the learned behavior. In particular, left-to-

right HMM architectures are particularly well-suited

to mine SAX-based signals. Note, additionally, that

left-to-right architectures are a good emotion descrip-

tor due to the high interpretability of the most prob-

able behavior of the signal when disclosing the most

probable emissions along the main path. Similar ar-

chitectures can be implemented by controlling the ini-

tial transition and emission probabilities.

Although the local-angle representation is not as

critical as SAX for sequential learning, its weighted

PhyCS2014-InternationalConferenceonPhysiologicalComputingSystems

Table 3: Inﬂuence of subjects’ proﬁle on EDA responses.

Myers-Briggs

type

Correlated features ([+] positive correlation;

[–] negative correlation)

Extrovert-introvert

[+] Dispersion metrics of SAX signal

[–] Centroid metrics of SAX signal

[–] Response amplitude

Sensing-intuition

[–] Dispersion metrics of raw and SAX signal

[–] Dispersion metrics of local-angles

[–] Rise time

Feeling-thinking

[+] Median and dispersion metrics of SAX sig-

nal

[–] Declive and centroid metrics of local-angles

[–] Rise time

Judging-perceiving

[–] Centroid metrics of raw signal

[–] Dispersion metrics of SAX signal

[+] Response amplitude

use for emotion recognition and discrimination has a

positive impact in the accuracy levels.

The why behind the success of adopting HMMs

for emotion recognition resides on their ability to: i)

detect ﬂexible behavior, such as peak-sustaining val-

ues and ﬂuctuations (hardly measured by features);

ii) to cope with individual differences (with the SAX

scaling strategy being done with respect to all stim-

uli, to the target stimulus, to all subjects or to subject-

speciﬁc responses); iii) to cope with subtle variation

using the local-angle representation is used as the in-

put signal; and iv) to deal with lengthy responses (by

increasing the number of hidden states). Additionally,

HMMs can easily capture either a smoothed behavior

or a more delineated behavior by controlling the sig-

nal cardinality using SAX.

Principle #5. Pearson correlations were tested to

correlate the physiological expression with the sub-

jects proﬁle. This analysis, illustrated in Table 3,

shows that their inclusion can be a critical input to

guide the learning task. A positive (negative) correla-

tion means that higher (lower) values for the assessed

feature are related with a polarization towards either

the extrovert, sensing, feeling or perceiving type.

We can observe, for instance, that responses from

sensers and feelers are quicker, while extroverts have

a more instable signal (higher dispersion) although

less intense (lower amplitude).

The insertion of the relative score for the four

Myers-Briggs types was found to increase the accu-

racy of IBk, who tend to select responses from sub-

jects with related proﬁle. Also, for non-lazy prob-

abilistic learners, four data partitions were created,

with the ﬁrst separating extroverts from introverts

and so on. One model was learned for each pro-

ﬁle. Recognition for a test instance now relies on

the equally weighted combined output of each model,

which result in an increased accuracy of 2-3pp. Al-

though the improvement seems to be subtle, note that

the split of instances hampered the learning of the

type-oriented models since we are relying on small-

to-medium number of collected signals.

Principle #6. The analysis of the variance of

key features and of the learned generative models per

emotion provide critical insights for further adapta-

tions of the learning task. For instance, the variance

of rising time across subjects for positive-surprise was

observed to be high due to the fact that some sub-

jects tend to experience a short period of distrust. The

inclusion of similar features in logistic model trees,

where a feature can be tested multiple times using

different values, revealed that they tend to be often

selected, and, therefore, should not be removed due

to their high variance.

Another illustrative observation was the weak

convergence of the Markov model for empathy due to

its idiosyncratic expression. Under this knowledge,

we adapted the left-to-right architecture to include

three main paths. After learning this new model, we

veriﬁed a heightened convergence of the model for

each one of the empathy paths, revealing three distinct

forms of physiological expression and, consequently,

an improved recognition rate.

Principle #7. We performed additional tests to

understand the impact of the experimental conditions

on the physiological expression of emotions. First,

we performed a t-test to assess the inﬂuence of fea-

tures derived from the signal collected during all the

affective interaction (without partitions by stimulus)

on the adopted type of interaction (human-to-human

vs. human-to-robot). Results over the SAX repre-

sentation show that human-to-human interactions (in

comparison to human-to-robot) have signiﬁcantly: i)

a higher median (revealing an increased ability to sus-

tain peaks), and ii) higher values of dispersion and

kurtosis (revealing heightened emotional response).

Second, we studied the impact of the subjects’

perception on the experiment by correlating signal

features with the answers to a survey made at the end

of the interaction. Bivariate Pearson correlation be-

tween a set of scored variables assessed in the ﬁnal

survey and physiological features was performed at a

5% signiﬁcance level. Table 4 synthesizes the most

signiﬁcant correlations found. They include posi-

tive correlation of local-angle dispersion (revealing

changes in the gradient) with intensity, felt inﬂuence

and perceived intention; positive correlation of SAX

dispersion (revealing heightened variations from the

baseline) with the perceived empathy, conﬁdence and

trust; quicker rise time for heightened perceived opti-

SevenPrinciplestoMineFlexibleBehaviorfromPhysiologicalSignalsforEffectiveEmotionRecognitionandDescription

inAffectiveInteractions

Table 4: Inﬂuence of subject perception in the physiological

expression of emotions.

Origin Correlations with higher statistical signiﬁcance

Local-

angle

features

[+] Dispersion metrics with the felt intensity, the under-

standing of the agent’s intention, and his level of inﬂu-

ence on felt emotions.

SAX-

based

features

[+] Dispersion metrics with the perceived empathy,

trust and conﬁdence of the agent.

Computed

metrics

[+/-] Amplitude positively corr. with the perceived

agent inﬂuence and negatively corr. with the felt plea-

sure;

[-] Rise time with the perceived positivism on the

agent’s attitude.

mism; and higher amplitude of responses for height-

ened felt inﬂuence and low levels of pleasure.

These two observations motivate the need to turn

the learning models sensitive to additional informa-

tion related with experimental conditions and with the

subject perception and expectations. Their inclusion

as new features in feature-based learners resulted in a

generalized improved accuracy (3-5pp).

5 CONCLUSIONS

This work provides seven important principles on

how to recognize and describe emotions during af-

fective interactions from physiological signals. These

principles aim to overcome the limitations of exist-

ing emotion-centered methods to mine signals. We

propose the use of expressive signal representations

to correct individual differences and to account for

subtle variations, and the integration of sequential and

feature-based models. Additionally, we demonstrate

the relevance of using the traits of the participant, in-

formation regarding the experimental conditions, and

speciﬁc properties of the learned models to improve

the learning task.

We presented initial empirical evidence that sup-

ports the utility for each one the enumerated princi-

ples. In particular, we observed that the adoption of

techniques to incorporate the seven principles can im-

prove emotion recognition rates by 20pp. Finally, a

new methodology was proposed to guide the inclu-

sion of these principles on the learning task.

ACKNOWLEDGEMENTS

This work was supported by Fundao para a Ciłncia e a

Tecnologia under the project PEst-OE/EEI/LA0021/2013

and PhD grant SFRH/BD/ 75924/2011, and by the

project EMOTE from the EU 7thFramework Program

(FP7/2007-2013).

REFERENCES

Andreassi, J. (2007). Psychophysiology: Human Behavior

And Physiological Response. Lawrence Erlbaum.

Bishop, C. M. (2006). Pattern Recognition and Machine

Learning (Inf. Science and Stat.). Springer-Verlag

New York, Inc., Secaucus, NJ, USA.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reute-

mann, P., and Witten, I. H. (2009). The weka data

mining software: an update. SIGKDD Explor. Newsl.,

11(1):10–18.

Jerritta, S., Murugappan, M., Nagarajan, R., and Wan, K.

(2011). Physiological signals based human emotion

recognition: a review. In CSPA, 2011 IEEE 7th Inter-

national Colloquium on, pages 410 –415.

Kulic, D. and Croft, E. A. (2007). Affective state estimation

for human-robot interaction. Trans. Rob., 23(5):991–

1000.

Leite, I., Henriques, R., Martinho, C., and Paiva, A. (2013).

Sensors in the wild: Exploring electrodermal activ-

ity in child-robot interaction. In HRI, pages 41–48.

ACM/IEEE.

Lessard, C. S. (2006). Signal Processing of Random Physi-

ological Signals. S.Lectures on Biomedical Eng. Mor-

gan and Claypool Publishers.

Lin, J., Keogh, E., Lonardi, S., and Chiu, B. (2003a). A

symbolic representation of time series, with implica-

tions for streaming algorithms. In ACM SIGMOD

workshop on DMKD, pages 2–11, NY, USA. ACM.

Lin, J., Keogh, E. J., Lonardi, S., and chi Chiu, B. Y.

(2003b). A symbolic representation of time series,

with implications for streaming algorithms. In Zaki,

M. J. and Aggarwal, C. C., editors, DMKD, pages 2–

11. ACM.

Murphy, K. (2002). Dynamic Bayesian Networks: Repre-

sentation, Inference and Learning. PhD thesis, UC

Berkeley, CS Division.

Picard, R. W., Vyzas, E., and Healey, J. (2001). Toward

machine emotional intelligence: Analysis of affective

physiological state. IEEE Trans. Pattern Anal. Mach.

Intell., 23(10):1175–1191.

Rani, P., Liu, C., Sarkar, N., and Vanman, E. (2006). An em-

pirical study of machine learning techniques for affect

recognition in human-robot interaction. Pattern Anal.

Appl., 9(1):58–69.

Wagner, J., Kim, J., and Andre, E. (2005). From physiologi-

cal signals to emotions: Implementing and comparing

selected methods for feature extraction and classiﬁca-

tion. In ICME, pages 940 –943. IEEE.

Wu, C.-K., Chung, P.-C., and Wang, C.-J. (2011). Extract-

ing coherent emotion elicited segments from physio-

logical signals. In WACI, pages 1–6. IEEE.

PhyCS2014-InternationalConferenceonPhysiologicalComputingSystems