2 RELATED WORK
Face-based estimates are commonly used in infe-
rence of emotional experiences. For example, (Busso
et al., 2004) explored multimodal emotion recogni-
tion using speech and facial expressions. An actor
read sentences while being recorded and face markers
were utilized to interpret facial muscular movement.
Similarly, (Ioannou et al., 2005) extracted face featu-
res and explored the understanding of users’ emotio-
nal states with a neurofuzzy method and facial ani-
mation parameters. As another example, (Tarnow-
ski et al., 2017) used a Microsoft Kinect to record
a 3D model of subjects’ faces with numerous facial
points. They recognized seven emotions using facial
expressions, a k-NN classifier, and a neural network .
Work in affective computing has also focused on re-
actions for specific emotions. For instance, (Shea
et al., 2018) studied intuitively extracted reactions to
surprise, spanning multiple modalities. Estimated fa-
cial expressions were particularly important for iden-
tifying naturally occurring surprise reactions.
More specifically, affective computing methods
have been considered promising for video recommen-
dation and classification. (Zhao et al., 2013) presented
a framework for recognizing human facial expressi-
ons to create a classifier, which identified what genre
viewers watched from their facial expressions. Howe-
ver, as they drew on acted facial data, reactions tended
to involve exaggerations rather than corresponding to
less direct, more intuitive and natural expressions, re-
sulting in modeling unsuitable for actual practical use.
The use of facial data towards video recommen-
dation was explored by (Rajenderan, 2014). To facial
expressions, Rajenderan added analysis of the pulse
modality, calculated with a method called photoplet-
hysmography, developed at MIT by (Poh et al., 2011).
The work was continued by (Diaz et al., 2018), with
a focus on estimating and visualizing viewers’ domi-
nant emotions over the course of a video. The pho-
toplethysmography method uses fluctuations in skin
color related to blood volume and the proportion of
reflected light to help estimate the viewer’s pulse. For
this case study, we also apply photoplethysmography
for non-invasive pulse estimation (see Figure 1), with
recalibration occurring between each video viewed by
the subject in the study.
3 METHODS
Conducting an experiment that entirely focuses on
face-based capture provides an opportunity to reflect
on challenges that occur when working with face-
based data estimates. We build on this experience
in presenting examples that illustrate methodological
considerations and summarize lessons learned, as a
springboard to formulate a set of recommendations
that can be useful for continued work towards af-
fective video recommendation. This section describes
how we collected face data and processed the face-
based estimates for use in predictive modeling, taking
a step towards video recommendation.
3.1 Data Collection
Equipment. The equipment used for data col-
lection included two standard webcams operating
in real-time: the Logitech C922 Pro Stream We-
bcam and the Logitech Pro 9000 Webcam. One we-
bcam was used to capture the subject’s facial ex-
pressions, while the other was used to estimate the
subject’s pulse using the aforementioned photoplet-
hysmography method. Additional hardware included
a desktop computer with a 24” computer monitor with
external loudspeakers, keyboard, and a mouse.
Stimuli. The experiment included carefully se-
lected short video clips with content from movies, TV
programs, or videos. The clips intended to elicit re-
actions corresponding to five major emotions: happi-
ness, sadness, anger, fear, and surprise. The emotio-
nal impact of the videos was assessed jointly by the
authors. Three clips were included per emotion cate-
gory with a total of 15 video clips. We avoided con-
tent that might cause strong discomfort. Table 1 pro-
vides an example from each emotion category. Sub-
jects consented to participating in the IRB-approved
study.
Procedure. The data collection process is illustra-
ted in Figure 2. Each subject was given oral instructi-
ons explaining the outline of what the experiment
would look like. After completing the consent form
and receiving a walk-through of the experiment, par-
ticipants filled out a demographic survey. They did
not know what clips they were watching prior to vie-
wing the videos.
(Diaz et al., 2018) discussed that an experimen-
ter being present could potentially have an effect on
a viewer’s emotional expressions. Accordingly, the
subject was alone in the room during the experiment
to mitigate any such an effect. For each video:
1. the subject was shown an image with instructions
for the pulse calibration
2. a 50 second video of a countdown was shown in
order to calibrate the pulse estimation
Considerations for Face-based Data Estimates: Affect Reactions to Videos
189