spect to all stimuli, to a target stimulus, to all subjects,
or to a specific subject. To treat temporal differences,
two strategies can be adopted. First, signals can be
used as-is (with their different numerosity) and given
as input to sequential learners, which are able to deal
with this aspect. Note, for instance, the robustness of
hidden Markov models on detecting hand-writing text
with different sizes in (Bishop, 2006). Second, the
use of piecewise aggregation analysis, such as pro-
vided by SAX, can be used to normalize numerosity
differences.
#2: Account for Relevant Signal Variations
Problem: Motifs and features sensitive to sub-
peaks are critical for emotion recognition (e.g. elec-
trodermal variations hold the potential to separate
anger from fear responses (Andreassi, 2007)). How-
ever, traditional methods rely on fixed amplitude-
thresholds to detect informative signal variations,
which became easily corrupted due to the individual
subject differences. Additionally, when cardinality is
reduced, relevant sub-peaks disappear.
Solution: Two strategies can be adopted. First,
a representation to enhance local variations, referred
as local-angle. The signal is partitioned in thin time-
partitions and the angle associated with the signal
variation for each partition is computed and translated
into symbols based on break-points computed from
the input number of symbols. Similarly to SAX, the
angle break points are also defined assuming a Gaus-
sian distribution. When adopting an 6-dim alphabet,
the following illustrative SAX-based univariate sig-
nal: <17,13,15,14,18,19,16,14,13,12,16,16>, would be
translated into the following local-angle representa-
tion: <0,4,1,5,5,0,1,1,1,5,4>.
Second, multiple SAX representations can be
adopted using different cardinalities. While mapping
the raw signals into low-cardinal signals is useful
to capture smoothed behavior (e.g. alphabet size
less than 8), a map into high-cardinal signals is able
to capture more delineated behavior (e.g. alphabet
size above 10). One model can be learned for
each representation, with the joint probability being
computed to label a response.
#3: Include Flexible Sequential Behavior
Problem: Although sequential learning is the nat-
ural option for audio-and-visual signals, the existing
models for emotion recognition mainly rely on ex-
tracted features. Feature-extraction methods are not
able to capture flexible behavior (e.g. motifs under-
lying complex rising and decaying responses) and are
strongly dependent on directive thresholds (e.g. peak
amplitude to compute frequency measures).
Solution: Generative models learned from se-
quential data, such as recurrent neural networks or dy-
namic Bayesian networks, can be adopted to satisfy
this principle (Bishop, 2006). In particular, hidden
Markov models (HMMs) are an attractive option due
to their stability, simplicity and flexible parameter-
control (Murphy, 2002). The core task is to learn the
generation and transition probabilities of a hidden au-
tomaton for each emotion. Given a non-labeled sig-
nal, we can assess the probability of being generated
by each learned model. An additional exploitation of
the lattices per emotion can be used to retrieve emerg-
ing patterns and, thus, be used as emotion descriptors.
The parameterization of HMMs must be based on
the signal properties (e.g. high dimensionality leads
to an increased number of hidden states). Alternative
architectures, such as fully-interconnected or left-to-
right architectures, can be considered.
From the conducted experiments, an analysis
of the learned emissions from the main path of
left-to-right HMM architectures revealed emerging
rising and recovering responses following sequential
patterns with flexible displays (e.g. exponential and
”stairs”-appearance behavior).
#4: Integrate Sequential and Feature-driven Models
Problem: Since sequential learners capture the
overall behavior of physiological responses, they are
not able to highlight specific discriminative properties
of the signal. Often such discriminative properties are
adequately described by simple features.
Solution: Feature-driven and sequential models
should be integrated as they provide different but
complementary views. One option is to rely on a post-
voting stage. A second option is to use one model to
discriminate the less probable emotions, and to use
such constraints on the remaining model.
Feature-driven models have been widely re-
searched and are centered on three major steps:
feature extraction, feature selection and feature-based
learning (Lessard, 2006; Jerritta et al., 2011). Expres-
sive features include statistical, temporal, frequency
and, more interesting, temporal-frequency metrics
(from geometric analysis, multiscale sample entropy,
sub-band spectra). Feature extraction methods in-
clude tonic-phasic windows; moving-sliding features;
transformations (Fourier, wavelet, Hilbert); compo-
nent analysis; projection pursuit; auto-associative
nets; and self-organizing maps. Methods to remove
features without significant correlation with the
emotion under assessment include sequential selec-
tion, branch-and-bound search, Fisher projection,
Davies-Bouldin index, analysis of variance and some
classifiers. Finally, a wide-variety of deterministic
and probabilistic learners have been adopted to per-
SevenPrinciplestoMineFlexibleBehaviorfromPhysiologicalSignalsforEffectiveEmotionRecognitionandDescription
inAffectiveInteractions
77