Proﬁling Arousal in Response to Complex Stimuli using Biosignals

Felix Putze, Dominic Heger, Markus M

uller, Christian Waldkirch, Yves Chassein, Ivana Kajic

and Tanja Schultz

Institute of Anthropomatics, Cognitive Systems Lab, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

Keywords:

Affective Computing, Biosignal-based Arousal Proﬁling, Person-independent Classiﬁcation.

Abstract:

We investigate the use of biosignals (blood volume pressure and electrodermal activity) for person-independent

proﬁling of arousal responses to complex, long-term stimuli. We report the design of a user study with 14

subjects to elicitate affective responses with ﬁlms of different genres. We present a detailed analysis of the

recorded signals and show that it is possible extract information on the differences between ﬁlms and within

each ﬁlm from biosignals. We use this information to automatically discriminate four ﬁlm classes in a person-

independent fashion with an accuracy of up to 97.8%.

1 INTRODUCTION

In human interaction, humans sense a large variety of

signals of the persons we are interacting with and try

to infere information about their cognitive and affec-

tive mental state. The term affective computing was

coined by Rosalind Picard (Picard, 2000) to describe

computer systems which can estimate and react to the

user’s affective state. This state can for example be

assessed based on biophysiological signals continu-

ously emitted by the human body. Today, researchers

widely agree that the perception of a human state in a

particular situation is of central interest for intelligent

systems to enhance system performance to achieve a

better user experience. Since the beginning of affec-

tive computing, a large number of studies and systems

has been published on the estimation of affect from

biosignals ((Picard et al., 2001), (Lichtenstein et al.,

2008), (Soleymani et al., 2008)). Arousal is one of the

most important aspects of affect. It is related to infor-

mation evaluation and task performance. This paper

contributes to this area a detailed analysis of arousal

proﬁles (i.e. the degree of arousal or its correlates over

time) as response to longer-lasting, complex stimuli

in form of complete ﬁlms of different genres. In HCI,

this is highly relevant as most interaction sessions, es-

pecially in the entertainment domain, consist of a long

sequence of interacting affective stimuli. We investi-

gate the possibility of extracting information on the

arousal proﬁle from biosignals and using this infor-

mation for automatic discrimination of ﬁlms and ﬁlm

segments.

2 EXPERIMENTAL SETUP

To collect data of multiple persons covering differ-

ent long-lasting dynamic affective states, we designed

an experiment using full short ﬁlms for affect elicita-

tion. During the presentation, we recorded physiolog-

ical responses to those ﬁlms. For our study, we se-

lected three ﬁlms of different genres: The ﬁrst one is

a zombie horror ﬁlm with socially critical undertone

(AKUMI). It starts with a slow exposition, than in-

troduces horror elements and culminates into a show-

down battle. The second ﬁlm is a slow, silent stop-

motion arthouse-ﬁlm with a constantly low suspense

curve (FLOWERS). The third one is a humorous an-

imation ﬁlm riddled with slapstick jokes (LIFTED)

about alien driving school. Before each ﬁlm, we in-

cluded a relaxation phase (RELAX) of approximately

two minutes to bring participants back to a neutral,

calm state. A relaxation phase consisted of a sequence

of nature stills with meditative music. To counter or-

dering effects, we randomly assigned participants to

two different permutations of the ﬁlms. This ensures

that for every pair of two ﬁlms, there are recordings

for both possible orderings. After each ﬁlm, partici-

pants ﬁlled out a ﬁlm related questionnaire to classify

its genre and to indicate their emotional response to

it using a Self Assessment Manikin (SAM, (Bradley

and Lang, 1994)). During each ﬁlm or relaxation

phase, we recorded EDA and blood volume pressure

using plethysmography (PPG). We showed the ﬁlms

in a dimmed, windowless and empty room to avoid as

much distraction as possible. Participants sat in a ﬁx-

347

Putze F., Heger D., Müller M., Waldkirch C., Chassein Y., Kajic I. and Schultz T..

Proﬁling Arousal in Response to Complex Stimuli using Biosignals.

DOI: 10.5220/0004249203470350

In Proceedings of the International Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS-2013), pages 347-350

ISBN: 978-989-8565-36-5

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

a ﬁxed position in front of a large projection screen

on which we showed the ﬁlms to create an intense

impression. Biosignals were recorded using a wire-

less biosignal monitor by PLUX

with a sampling rate

of 1000Hz. In total, we recorded about 23 minutes

of biosignals for each participant. 14 participants at-

tended our study. All of them were students or visitors

at the KIT between 14 and 30 years. Four partici-

pants were female. We ensured by analyzing content-

related questionnaires that participants perceived the

three presented ﬁlms as sufﬁciently different.

For self-assessment of emotions experienced dur-

ing the ﬁlm, a subscale of the SAM technique is used

to measure arousal. Arousal is comparable for the en-

gaging ﬁlms AKUMI and LIFTED (3.29 and 3.23)

and signiﬁcantly lower for the slow paced FLOWERS

(2.53). As we expected people to experience a variety

of emotional states during the course of each ﬁlm, we

combined the SAM-technique with a time axis: Af-

ter watching each ﬁlm, participants state their arousal

continously over course of the whole ﬁlm by draw-

ing a curve from the beginning of the ﬁlm to the end.

To give orientation for the participants, we presented

stills of the ﬁlm from every minute as reminders of the

ﬁlm ﬂow. For analysis, we sample each curve at every

full minute to derive a self-assessed arousal proﬁle.

The average range of arousal values over the course

of a ﬁlm is as high as 1.72, which is higher than the

differences between averages for different ﬁlms. This

indicates that even a short ﬁlm with a clear affective

tendency (e.g. a horror ﬁlm) has a dramaturgy that re-

sults in both low and high arousal values and which

should also be reﬂected in the biosignal recordings.

This also indicates that we cannot simply use an aver-

age arousal score to label the complete corresponding

biosignal stream but that we have to take a more de-

tailed look at the dramaturgic structure.

3 SIGNAL ANALYSIS

In the next step, we analyze the variation of the

recorded biosignals in relation to the dramaturgic

structure of a ﬁlm. Starting with an overall compari-

son of the different ﬁlms, we see a signiﬁcant differ-

ence (p < 0.05) in mean EDA signal amplitude be-

tween FLOWERS and AKUMI and between FLOW-

ERS and LIFTED. However, as we already saw that

the arousal proﬁle to a ﬁlm varies strongly over time,

we now look at temporal patterns in the recorded

biosignals. In order to trace the affect changes that

were elicited through different types of ﬁlms and

http://www.plux.info

within each ﬁlm, we look at the EDA signal because

a change in skin conductivity occurs quickly as a re-

sponse to an increased level of arousal and can be in-

terpreted in the time domain.

As each individual EDA curve contains a lot of

session speciﬁc effects which cannot easily be at-

tributed to events in the ﬁlm, we generate an averaged

EDA curve for each ﬁlm from the data of all partic-

ipants. For each participant, the EDA signal is nor-

malized and lowpass ﬁltered at 0.5 Hz. Afterwards,

we calculate the averaged EDA signal. It is corre-

spondingly scaled and illustrated in Figure 1 for the

ﬁlms AKUMI and FLOWERS

. When we compare

both curves, we notice a number of differences: The

averaged EDA signal for AKUMI shows strong tem-

poral variation while the signal for FLOWERS is rel-

atively smooth. This is caused by a larger number of

sharp rises of the EDA signal for AKUMI as a respose

to unexpected, surprising or exciting events (Ekman

et al., 1985) in the individual signals. We call these

rises startles. The lack of startles for FLOWERS also

causes the monotonic decreasing trend of the curve.

Taking the connection between arousal and EDA ac-

tivity into account, we can draw the conclusion that

certain events in the horror ﬁlm AKUMI in general

cause a higher arousal than the slowly paced arthouse

ﬁlm FLOWERS. We also see that the averaged EDA

signals roughly match the trend of the averaged dis-

cretized arousal curves we extracted from the SAM

(with a time delay of ca. 1 minute).

Figure 1: Top: Averaged EDA curve for the ﬁlm AKUMI

with indices for marked scenes; Bottom: Averaged EDA

curve for the ﬁlm FLOWERS.

Still, the EDA signal for AKUMI is not station-

ary across the whole ﬁlm. Can we determine which

effects cause the observed peaks and level changes?

Comparing the EDA signal to the arousal curve, we

The plots are generated by averaging individually z-

normalized EDA curves and are therefore unit-free.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

348

only see a very rough match. However, when review-

ing the ﬁlm material directly, we get a more precise

picture: We can identify certain scenes which drasti-

cally change the dramaturgy of the ﬁlm. These scenes

are marked by vertical lines in Figure 1. The strong

EDA rises at the second and third marker correspond

to climatic scenes of the ﬁlm while the valley between

seconds 60 and 240 corresponds to a slow-paced and

monolog-driven part of the ﬁlm. We can quantify this

effect by automatically measuring the tempo of the

ﬁlm as number of scene changes in a sliding time win-

dow. The resulting graph in Figure 2 (top) shows that

a rising tempo around second 240 corresponds very

well to the lasting rise of the EDA curve and indicates

a strong general affective response to this change in

dramaturgy. This type of analysis is not possible

for FLOWERS because of the employed stop-motion

technique. To explain the course of the EDA curve

for this ﬁlm, we instead compare it to one generated

from the corresponding RELAX sections (see Fig-

ure 2 (bottom)) and see a very similar monotonous

trend in the signal after a short rise at the start of each

phase. Only the begin of the ﬁnal credits (which are

missing for RELAX) marks the onset of a slow rise.

This indicates that the arousal proﬁle for the slow

paced FLOWERS is comparable to the one for RE-

LAX. In summary, we see signiﬁcant differences in

the EDA-derived arousal proﬁles in response to ﬁlms

of different genre and style and also to different sec-

tions within one ﬁlm.

Figure 2: Top: Comparison of EDA curve with number of

scene changes for AKUMI; Bottom: Averaged EDA curve

for a RELAX phase.

4 MOVIE CLASSIFICATION

We will now use those observed characteristic,

biosignal-based arousal proﬁles as features for auto-

matic classiﬁcation of ﬁlms and ﬁlm segments. We

start by investigating the classiﬁcation of the recorded

biosignals into four classes: the three different ﬁlms

as one class each and all relaxation phases combined

as another one. We design a person-independent setup

to investigate the ability of differentiating ﬁlms based

on their arousal proﬁles.

The ﬁrst step of the classiﬁcation process is a z-

normalization of the raw data for each subject to level

different signal baselines. We segment the data into

windows of 20 seconds lengths with an overlap of

10 seconds. On each window, we extract tentative

features: For the PPG signal, we extract the mean

heart rate and its variation by applying peak detec-

tion on the bandpass ﬁltered signal. For the EDA

signal, the number of startles is determined automati-

cally using peak detection after low-pass ﬁltering. We

then extract the mean EDA value and the number of

startles. We now derive the ﬁnal features which de-

scribe the complete arousal proﬁle of one ﬁlm based

on the whole signal stream. Those features consist of

statistics on the window-based features for each ﬁlm:

mean, maximum and mimimum value, standard de-

viation, relative peak position, peak width and rela-

tive centroid. Final features are independent of the

ﬁlm length to avoid leakage of this information to the

classiﬁcation process. This results in a feature vector

of 28 dimensions. After feature extraction, we train

and evaluate a Naive Bayes classiﬁer using leave-one-

person-out cross-validation. Within each iteration of

the cross-validation, we use Sequential Forward Fea-

ture Selection (SFFS) on the training set of the re-

spective fold to select the best features. To evaluate

features during this selection process, an inner cross-

validation within each fold is performed using strat-

iﬁed sampling. Using this setup, we achieve a very

high average accuracy of 97.8% with a minimum pre-

cision of 88.24% and a minimum recall of 93.3% over

all classes. The standard deviation between folds is

8.3%, which indicates relative stability in the face of

small changes of test and training data.

To investigate the generalization abilities and the

feature stability, we calculated a histogram on the se-

lected features. On average, the feature selection re-

sulted in a feature set of size 2.45 (median: 2). The

left part of Table 1 gives an overview of the most

frequently selected features over all cross-validation

folds. It indicates that there are some stable features

which are regularly picked over others; Only 8 of 28

features are ever selected. This result indicates that

those features generalize well across persons. When

training a model using the features from Table 1 in-

stead of using SFFS for each fold, we still achieve

an average recognition rate of 95.56%, indicating that

the original recognition accuracy was not the result

of over-specialization. To investigate the predic-

ProfilingArousalinResponsetoComplexStimuliusingBiosignals

349

Table 1: Left: Most frequently selected features over all

cross-validation folds. Table shows the selection frequency

in percent for features derived from mean heart rate (MHR),

variance of heart rate (VHR), mean EDA (EDA) and num-

ber of startles (STA): Peak Width (PW), Relative Peak In-

dex (RPI), Average (AVG), Standard Deviation (SD), Max-

imum (Max), Minimum (Min). Right: Same information

for feature selection without peak width, peak position and

centroid.

full feature set restricted feature set

Signal Feat. Freq. Signal Feat. Freq.

MHR PW 53.3 EDA SD 100.0

EDA PW 46.7 MHR Avg 73.3

MHR SD 33.3 MHR SD 73.3

MHR Avg 13.3 MSC Avg 66.7

MHR Min 13.3 VHR Min 40.0

STA PW 13.3 EDA Min 33.3

STA SD 6.7 STA Avg 33.3

VHR RPI 6.7 MHR Max 26.7

MSC Max 26.7

VHR Avg 26.7

Table 2: Average accuracies in percent for pairwise classi-

ﬁcation of windows from different minutes (1 = 0s to 60s, 2

= 60s to 120s, . . . ) of the ﬁlm AKUMI.

Min. 2 3 4 5 6 7 8

1 61 54 67 64 75 70 56

2 52 58 59 72 70 67

3 64 63 77 73 63

4 58 72 70 66

5 70 67 61

6 59 67

7 58

tive power of both modalities separately, we see sim-

ilar recognition rates if we restrict the feature set to

only EDA-based features (97.8%) or only PPG-based

features (96.67%). We conclude that both modalities

carry information on the arousal proﬁle and depend-

ing on the application it may be possible to reduce the

number of required sensors.

Note that some of the employed features (e.g. rel-

ative centroid position) encode information speciﬁc

to the dramaturgy of the ﬁlms. Therefore, the trained

model will not be applicable to different ﬁlms without

loss of recognition accuracy (albeit, ﬁlms of similar

dramaturgical structure could work). We therefore re-

peat evaluation after removing the features encoding

relative peak position, relative centroid position and

peak width. As expected, the accuracy drops signiﬁ-

cantly to 73.3%. The merit of this model is that it still

provides reasonable recognition accuracy using much

more generic features which promise generalizability

to different ﬁlms. The selected features are given in

the right part of Table 1. Again, we identify a number

of features which is repeatedly selected across folds.

As documented in Section 3, signiﬁcant differ-

ences cannot only be noted between different ﬁlms

but also during the course of one ﬁlm. To investigate

the possibility of identifying different parts of the ﬁlm

based on biosignals, we classify the window-based

features extracted for the process described above for

the movie AKUMI. To each window, we assign a la-

bel based on its position within the ﬁlm, using one

label for each full minute. Classiﬁcation is performed

pairwise for each combination of two labels to investi-

gate similarity effects. In this setup, we do not expect

high classiﬁcation accuracy for each pair of segments.

Instead, we can interprete the recognition accuracy as

a measure of distance between two segments based

on the arousal proﬁle. Table 2 presents the results of

leave-one-person-out cross-validation. As expected,

performance reaches levels of up to 77% for sections

which are dramaturgically very different. For sections

which are similar in this regard (e.g. both from the

fast-paced ending), accuracy drops. This result is in

strong accordance with the observations on Figure 1

and shows that, even given the difﬁculty induced by

fuzzy class transitions, automatic affective proﬁling

of a ﬁlm is possible.

REFERENCES

Bradley, M. and Lang, P. (1994). Measuring emotion: the

self-assessment manikin and the semantic differential.

Journal of behavior therapy and experimental psychi-

atry, 25(1):49–59.

Ekman, P., Friesen, W., and Simons, R. (1985). Is the star-

tle reaction an emotion?. Journal of Personality and

Social Psychology, 49(5):1416.

Lichtenstein, A., Oehme, A., Kupschick, S., and Jrgensohn,

T. (2008). Comparing two emotion models for deriv-

ing affective states from physiological data. In Pe-

ter, C. and Beale, R., editors, Affect and Emotion in

Human-Computer Interaction, volume 4868 of LNCS,

pages 35–50. Springer Berlin / Heidelberg.

Picard, R. (2000). Affective computing. The MIT press.

Picard, R., Vyzas, E., and Healey, J. (2001). Toward ma-

chine emotional intelligence: Analysis of affective

physiological state. Transactions on pattern analysis

and machine intelligence, pages 1175–1191.

Soleymani, M., Chanel, G., Kierkels, J. J. M., and Pun,

T. (2008). Affective characterization of movie scenes

based on multimedia content analysis and user’s phys-

iological emotional responses. Multimedia, Interna-

tional Symposium on, 0:228–235.

BIOSIGNALS2013-InternationalConferenceonBio-inspiredSystemsandSignalProcessing

350