Can Pupillary Responses while Listening to Short Sentences Containing
Emotion Induction Words Explain the Effects on Sentence Memory?
Shunsuke Moriya
1
, Katsuko T. Nakahira
1 a
, Munenori Harada
1
, Motoki Shino
2
and Muneo Kitajima
1 b
1
Nagaoka University of Technology, Nagaoka, Niigata, Japan
2
The University of Tokyo, Kashiwa, Chiba, Japan
mkitajima@kjs.nagaokaut.ac.jp, motoki@k.u-tokyo.ac.jp
Keywords:
Emotion Induction Word, Pupil Dilation Response, Memory, Contents Design.
Abstract:
In content viewing activities, such as movies and paintings, it is important to retain and utilize the viewing
experience in memory. We have been studying the effect of the content of visual and auditory information pro-
vided during viewing activities and presentation timing on content memory. We have clarified the appropriate
timing of presenting visual information that should be supplemented by auditory information. We have also
found that the inclusion of emotion induction words in the auditory information is effective in forming content
memory. In this study, we present a framework for examining the effects of emotion-evoking characteristics
of short sentences while taking into account individual differences in memory. Subjects were presented with a
short sentence with an emotion-inducing word at the beginning of the sentence, in which the impression of the
entire short sentence would appear at the end of the sentence. We designed an experimental system to clarify
the relationship between subject-specific pupillary responses to the emotion induction words and memory for
short sentences. Our findings indicate a scheme that relates the pupillary response to short sentence memory.
1 INTRODUCTION
The development of digital technology has enabled
the dispensing of knowledge by combining various
types of digital content. There are two types of knowl-
edge: explicit knowledge, which can be explicitly
expressed by symbols, and latent knowledge, which
is understood by reading between the lines. Digital
technology is suitable for expressing explicit knowl-
edge because of its high affinity to symbolic represen-
tation. On the other hand, to express latent knowledge
through digital content, it is necessary to understand
how humans, the recipients of the information, pro-
cess digital information.
We have focused on viewing behavior as an ex-
ample of latent knowledge transfer, including experi-
ence, and have studied the effects of the content and
timing of presenting visual and auditory information
during viewing behavior on the memory of contents
in terms of both quantity and quality of information
provided. With regard to the quantity of informa-
a
https://orcid.org/0000-0001-9370-8443
b
https://orcid.org/0000-0002-0310-2796
tion, according to Hirabayashi et al., when there are
two-tracks of information sources (e.g., visual and
auditory), I
1
and I
2
, to be related, the information
can be retained without information overload by pre-
senting I
1
and I
2
with a certain time interval between
them (Hirabayashi et al., 2020).
Regarding the quality of information, Murakami
et al. conducted an experiment on memory with a par-
ticular focus on auditory information. They created
explanatory text, including emotion induction words,
played auditory stimuli with the explanatory text read
aloud along with the video, and had the participants
listen to the video. The results of the impression eval-
uation and replay test of the video content showed
that the participants’ memory of the video was not
affected by the stimuli (Murakami et al., 2021).
The results of these studies suggest that the depth
of the learner’s memory, i.e., the acquisition of lateral
knowledge, is strongly connected with the learner’s
emotions and information about the surrounding en-
vironment. Therefore, it is important to design learn-
ing contents for the transfer of lateral knowledge by
generating emotion and including the surrounding en-
vironment.
Moriya, S., Nakahira, K., Harada, M., Shino, M. and Kitajima, M.
Can Pupillary Responses while Listening to Short Sentences Containing Emotion Induction Words Explain the Effects on Sentence Memory?.
DOI: 10.5220/0011698200003417
In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) - Volume 2: HUCAPP, pages
213-220
ISBN: 978-989-758-634-7; ISSN: 2184-4321
Copyright
c
2023 by SCITEPRESS Science and Technology Publications, Lda. Under CC license (CC BY-NC-ND 4.0)
213
This paper focuses on narration, a component of
content, to establish a content design method for more
effective lateral knowledge transfer. When a narration
comprises very short sentences and includes some el-
ements that induce emotion, we assume that emotion
is generated when a person hears it. Based on the re-
lationships between (1) pupillary response and emo-
tion, and (2) emotion and memory, which have been
studied in recent years, we analyzed the possibility
of capturing the characteristics of individual learn-
ers’ acquisition of lateral knowledge through pupil-
lary response, which is a measurable quantity. The
results could contribute to the development of a con-
tent design method that is useful for transferring lat-
eral knowledge. Section 2 describes the model used
to conductthis study. Section 3 describes the experi-
mental system based on the model. Section 4 presents
the experimental results. In Section 5, based on the
experimental results, we discuss the relationship be-
tween emotion, memory, and and pupillary response.
2 MODEL
2.1 Emotion and Memory
Murakami et. al. shows the cognitive model for
memory formation introduced previous research (Mu-
rakami et al., 2021). Perceived information that has
passed through the sensory memory and sensory in-
formation filter reaches long-term memory. Various
types of chunks are stored in the long-term memory,
and the chunks related to the input perceptual infor-
mation are integrated in the working memory to pro-
duce reactions and make decisionsEach chunk is then
enhanced by associating it with the perceptual infor-
mation at the time the chunk was invoked.
One candidate for association would be the emo-
tion that was generated when the chunk was invoked.
The emotion that is generated may be a factor to im-
prove memory performance. In general, it is difficult
to control one’s emotions. However, it is possible to
add words that easily induce emotions to sentences, or
to make it easier to induce emotions from the atmo-
sphere of the sentences or induce emotions from the
atmosphere of the sentences as a way to provide emo-
tional stimuli. Based on this, we hypothesized that it
is possible to investigate the relationship between the
ease of remembering given information and emotion
based on the biological responses of people when they
perceive auditory stimuli that include emotion induc-
tion words.
duration time
emotion
induction word start
narration
start
t
a
[s]
t
vs
[s]
t
a
+Δt[s]
emotion
induction word finish
narration
finish
analysis target for
pupil diameter change
trial
start
event
visual
stimuli
audio
stimuli
pupil
dilation
r
pd
define average r
pd
as baseline
near reaction
caused by
gaze fixation
near reaction
caused by
animation
mydriasis
miosis
reaction caused by
emotion induction word
calculated
baseline
t
ns
[s]
t
trial
[s]
Figure 1: Behavioral model for pupil response.
2.2 Emotional Stimuli and Pupillary
Response
The relationship between the emotion induction
words as the auditory stimuli and the physiological
responses to hearing them, is as follows: First, the
emotion induction word is that which is related to
emotion by its meaning. Regarding emotion, it is
proposed that it can be expressed in two axes, plea-
sure/misery and arousal/sleepiness (Russell, 1980).
Emotions such as emotion induction words are mea-
sured by a self-assessment manikin (SAM) (Bradley
and Lang, 1994). This suggests that emotion can be
represented on two axes, valence and arousal.
When using stimuli with emotional information,
Affective Norms for English Words (ANEW) are
used (Bradley and Lang, 1999). For each ANEW,
valence, arousal, and dominance are measured as
quantities indicating the degree of emotion. In
Japanese, valence and arousal are measured against
Japanese translations of words appearing in the
ANEW (Honma, 2014). The words in this list were
defined as Emotion Induction Words (EIW). Both va-
lence and arousal are psychometric quantities as they
are measured by the SAM. A person’s psychologi-
cal state is thought to affect his or her physiological
state, including the pupil. Therefore, we considered
that a person’s physiological state is affected by the
valence and arousal values of the emotion induction
word, and we considered it appropriate to derive a re-
lationship between the pupillary response and the va-
lence/arousal of the EIW, which has been established
previously.
In this paper, we considered a person’s pupil-
lary response to auditory stimuli containing EIWs as
follows. Many studies have investigated pupillary
responses and visual/auditory stimuli, emotion, and
arousal (Zekveld et al., 2018). The pupillary response
during word processing becomes larger when difficult
or unknown words or words that convey darkness are
read. We consider the relationship between EIWs and
pupillary response in the following framework.
HUCAPP 2023 - 7th International Conference on Human Computer Interaction Theory and Applications
214
2.3 Quantifying Pupillary Responses to
Emotional Stimuli
Figure 1 shows the model of stimulus - event timeline
- pupil dilation per trial. The event starts at t
trial
, but
a sufficient time interval, t
ns
, between the event and
the time when the auditory stimulus narration starts
so that the near reaction in the pupil does not interfere
is required with the analysis of the pupil’s response.
The narration contains an EIW, and t
vs
is made to ut-
ter the EIW after a certain time has elapsed from t
ns
.
When a person perceives auditory EIW, a biological
reaction is induced according to the meaning, valence,
and arousal. The generation of a response requires
some time after the EIW is perceived. In this paper,
we assume that this occurs around t
a
and focus on the
amount of change in pupil diameter r
pd
(t) at t among
the pupillary responses that occur between t
a
and t.
The pupil diameter change is calculated by the fol-
lowing formula with baseline ˜r
pd
in the range (t
ns
[ms], t
ns
) :
˜r
pd
=
1
Z
t
ns
t
ns
r
pd
(t) dt
In this study, was set to 500 ms. In this case, the
pupil diameter change r
pd
can be expressed by the
following formula:
r
pd
(t) = r
pd
(t) ˜r
pd
The pupil diameter variation with r
pd
from time t to
δt can be expressed as:
δr(t) = r
pd
(t + δt) r
pd
(t)
A δr(t) > 0 indicated mydriasis and δr(t) < 0 indi-
cated miosis. The total change in mydriasis (r
myd
) and
total change in miosis (r
mio
) can be expressed by the
following equations:
r
myd
or r
mio
=
Z
t
a
+t
t
a
δr(t)dt
The total change in pupil diameter, r
all
, can be calcu-
lated using the following formula :
r
all
= abs(r
myd
) + abs(r
mio
)
We analyze the relationship between the sub-
jective evaluation of sentences containing EIWs
(r
myd
, r
mio
, r
all
, and auditory stimuli) and the memory
of the heard sentences and obtain knowledge about
the design of easily remembered sentences.
3 EXPERIMENT
In the experiment, short sentences containing EIWs
were presented as auditory information, and partici-
pants were required to evaluate their subjective im-
pressions. After all sentences were presented, partic-
ipants were required to perform a recall test for audi-
tory information after a short break. In this series of
procedures, the experimental environment was con-
strained to avoid affecting the participant’s subjective
impression of auditory information and the change
in pupil diameter. This is further explained in Sec-
tion 3.1. To more accurately estimate the relation-
ship between the two features of the EIW (valence
and arousal) and pupil response, it is important to de-
sign short sentences that include auditory stimuli. The
method is explained in Section 3.2. The experimental
procedure is explained in Section 3.3.
3.1 Method and Participants
The experiment was conducted in a soundproofed
room with relatively low ambient noise. As the
lighting condition in this room was maintained, the
changes in participants’ pupil diameter were not af-
fected by ambient light. Participants received audi-
tory stimuli through headphones connected to a com-
puter. A Dell S2440L 24-inch display was used to
project the experimental instruction and tasks. The
participants were seated 0.8 m from the display and
ask to stare at it during the experiment. Twenty-one
participants in their 20s were included. Pupil diame-
ter was measured using Tobii Pro Nano, a noninvasive
corneal reflectometry system.
3.2 Design of Short Sentences
Auditory stimulus was designed based on the follow-
ing four considerations: (1) whether it is comprehen-
sible after one hearing, (2) classification of EIWs,
(3) control of t
ns
, t
vs
, t
a
, t
a
+ t in Figure 1, and
(4) control over the mood of the sentence.
Consideration (1) is important from the perspec-
tive of semantic comprehension and memory. Con-
sideration (2) is important from the perspective of ex-
plaining the relationship between pupillary response
and EIWs, which have two variables, valence and
arousal, to connect observable value of bioinformat-
ics. Consideration (3) is important to show that the
pupillary response is definitely caused by the EIW.
Consideration (4) is important from for clarifying the
effect on subjective evaluation whether or not depend-
ing on the sentences which have different atmosphere
but include same EIW.
First, the policy for designing short sentences to
facilitate text comprehension by participants were as
follows One consideration for facilitating auditory
text comprehension is the length of each auditory
Can Pupillary Responses while Listening to Short Sentences Containing Emotion Induction Words Explain the Effects on Sentence
Memory?
215
2s 4s 2s
6s
3s
40
trials
recall
test
gaze fixation
(display fixed point)
silent interval beep play the narration silent interval
assess
narration
impression
display an animation like the one
in the figure so that participants
can easily gaze at the monitor.
Participants will be assessed on a
7-point scale and ‘unable to assess’.
When they hover over an item,
the letter is highlighted.
a chime is played
to signal the start of
narration play
Figure 2: The experimental Design.
stimulus. Preliminary experiments showed that the
length of auditory stimulus that participants could
easily understand without any resistance was around
6 seconds. Therefore, we set the length of the sen-
tence, an auditory stimulus, to around 6 seconds. This
implies that an auditory stimulus must include ap-
proximately 30 ± 10 of kana characters, which were
converted to mora numbers (counted in kana charac-
ters for Japanese). Moreover, most of professional
announcers in Japan are trained to do so. Based on
these facts, the number of kana characters (number of
mora) around 6 seconds auditory stimulus should be
included 1/10 of 300 mora.
The classification of EIWs is as follows. Accord-
ing to Honma’s study (Honma, 2014), both valence
and arousal for words that can be EIWs are classified
in 9 levels. The set of EIWs is W, where N
EIW
repre-
sents the number of elements in W:
W = {W
i
| i = 1, ··· ,N
EIW
}.
W
i
has the average and standard deviation of the va-
lence V
i
and arousal Ar
i
. Then, W
i
is classified ac-
cording to the following levels using valence V
i
and
arousal Ar
i
.
V
i
was classified as follows:
Positive valence (V
+
) 7.00 9.00
Neutral valence (V
N
) 4.00 6.99
Negative valence (V
) 1.00 3.99
V
+
, V
N
, V
were further classified as follows:
V
++
(7.70 9.00), V
NN
(5.00 5.99), V
−−
(1.00
1.99).
Similarly, Ar
i
was classified as follows:
High arousal (Ar
H
) 7.00 9.00
Medium arousal (Ar
M
) 4.00 6.99
Low arousal (Ar
L
) 1.00 3.99
Ar
H
and Ar
L
were adopted. For Ar
H
or Ar
L
, we used
words with a standard deviation of 2.00 or less to sup-
press the variation in participants’ impression of the
stimuli when there are multiple words that satisfy the
criteria. In addition, only one EIW was included in
one sentence.
Next, the design policy for controlling t
vs
, t
vs
+
t as the timing of EIW appearance was as follows.
Table 1: Overview of designed auditory stimuli.
valence level of emotion-induction word: POSITIVE (++)
arousal level of atmosphere of number of notation
emotion-induction word sentence sentences for the sentence
HIGH POSITIVE 3 (V
++
,Ar
H
,At
+
)
HIGH NEGATIVE 3 (V
++
,Ar
H
,At
)
LOW POSITIVE 3 (V
++
,Ar
L
,At
+
)
LOW NEGATIVE 3 (V
++
,Ar
L
,At
)
total number of sentences 12
valence level of emotion-induction word: NEUTRAL (N)
HIGH POSITIVE 2 (V
N
,Ar
H
,At
+
)
HIGH NEUTRAL 4 (V
N
,Ar
H
,At
N
)
HIGH NEGATIVE 2 (V
N
,Ar
H
,At
)
LOW POSITIVE 2 (V
N
,Ar
L
,At
+
)
LOW NEUTRAL 4 (V
N
,Ar
L
,At
N
)
LOW NEGATIVE 2 (V
N
,Ar
L
,At
)
total number of sentences 16
valence level of emotion-induction word: NEGATIVE (−−)
HIGH POSITIVE 3 (V
−−
,Ar
H
,At
+
)
HIGH NEGATIVE 3 (V
−−
,Ar
H
,At
)
LOW POSITIVE 3 (V
−−
,Ar
L
,At
+
)
LOW NEGATIVE 3 (V
−−
,Ar
L
,At
)
total number of sentences 12
We set t
ns
t
vs
to an interval of 1 second in order
to easily capture the pupillary response. Therefore,
around 5 mora auditory such as word and so on set
before EIW. To prevent t from being too short, the
number of mora in EIW was also set to be around 5
mora. In addition, t
a
t
vs
were set at 1 [s].
Finally, the design policies for the atmospheres of
the sentences were as follows. The atmospheres of
the sentences were classified as Positive, Neutral, and
Negative, and represented by At
+
, At
N
, and At
re-
spectively. The atmospheres of the whole sentences
were generated based on the following policies.
For At
+
, EIWs with V
++
were used, and positive
vocabulary or double negation were used at the
end of sentences. When an EIW with V
−−
was
used, the word was not used as the subject but to
modify other vocabulary.
For At
N
, objective facts were described without
using emotive words.
For At
, negative vocabulary was used at the end
of the sentence in addition to EIW; positive vo-
cabulary was not used in the sentence or positive
vocabulary was used in the sentence and negated
at the end.
Based on the above, we generated 40 auditory
stimuli with the combinations shown in Table 1.
HUCAPP 2023 - 7th International Conference on Human Computer Interaction Theory and Applications
216
Eps,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Eps,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Enu,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Enu,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Eng,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Eng,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Eps,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Eps,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Enu,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Enu,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Eng,H/Lar, pos(1)neuneg(7)
miosis
Density
1.2 1.0 0.8 0.6 0.4 0.2 0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Eng,H/Lar, pos(1)neuneg(7)
score
Density
0 1 2 3 4 5 6 7
0.00
0.05
0.10
0.15
0.20
0.25
0.30
V
++
V
N
V
--
miosis miosismiosis
score scorescore
V
++
V
N
V
--
frequency(arb.unit)
frequency(arb.unit)
frequency(arb.unit)
frequency(arb.unit)
frequency(arb.unit)
frequency(arb.unit)
Figure 3: Histograms of miosis distribution (upper row) and
the impression evaluation value (lower row) for all partici-
pants categorized by valence and arousal. From left side to
right side, it is for V
++
/ V
N
/ V
−−
. Red histogram is for
high arousal, and blue histogram is for low arousal.
Table 2: Categorization of participants’ types which combi-
nations are V
++
/V
N
/V
−−
, Ar
H
/ Ar
L
, Ar
+
/Ar
N
/Ar
.
V
++
, Ar
H
V
N
, Ar
H
V
--
, Ar
H
assess recall closed assess recall closed assess recall closed
+
47(74.6)
0
17(27.0)
40(95.2)
0
24(57.1)
46(73.0)
8(62.5)
28(44.4)
N/A N/A N/A
36(42.9)
1(100.0)
43(51.2)
N/A N/A N/A
-
53(84.1)
0
35(55.6)
30(71.4)
0
23(54.8)
40(63.5)
7(71.4)
25(39.7)
V
++
, Ar
L
V
N
, Ar
L
V
--
, Ar
L
assess recall closed assess recall closed assess recall closed
+
54(86.0)
0
32(50.8)
31(74.0)
0
14(33.3)
43(68.3)
1(0.0)
33(52.4)
N/A N/A N/A
28(84.0)
2(0.0)
42(50.0)
N/A N/A N/A
-
49(77.8)
5(100)
34(54)
32(76.2)
1(0.0)
21(50.0)
50(79.4)
0
31(49.2)
3.3 Procedure
The experimental flow of each trial is described based
on Figure 2. First, gaze fixation was performed with-
out auditory stimuli for 2 [s] at the beginning of the
experiment to fix the pupil diameter measurement.
Then, the soundless interval was set to 4 [s] to avoid
any influence on the pupil diameter measurement be-
tween the presented stimuli, assuming a near-sighted
pupil effect.
Next, a chime sound was presented for 1 [s] to in-
dicate the start of the experiment, followed by a 1 [s]
soundless interval to prevent any change in pupil di-
ameter caused by the biological response to the chime
sound. Later, the narration, which is the presented
stimulus of the experiment, was replayed. The reason
for presenting animation at the same time was to make
it easier for the participant to fixate his/her gaze. We
confirmed via preliminary experiments that animation
has little effect on the mydriasis or miosis of the pupil
diameter. At this time, the animation shown in Fig-
ure 2 continued to be displayed. A soundless interval
of 3 seconds is set after the narration was complete
to stabilize the pupil diameter fluctuation, so that the
pupil diameter measurement during the narration im-
pression evaluation would not be affected.
Finally, the subjective impression of the narration
on the participants was conducted according to a 7-
points Likert scale shown on the computer screen.
The subjective evaluation was performed by selecting
the rating results using radio buttons, as shown in Fig-
ure 2. When the selection was complete, the “Next”
button was clicked to start the next trial. All visual
and auditory stimuli before the subjective impression
evaluation screen were presented automatically. Af-
ter repeating the above 40 trials, a recall test was con-
ducted by asking the participant to recite as many of
the 40 sentences as possible from memory.
4 RESULT
The experimental results are summarized in Table 2
and Figure 3. Table 2 shows the responses of partici-
pants classified by (V
++
V
N
V
−−
) and (Ar
H
Ar
L
),
which are characteristics of EIWs included in sen-
tence stimuli and (At+/At
N
/At
), which are the char-
acteristics of the whole sentence. The column “as-
sess” shows the subjective impressions of the sen-
tence stimuli, and are answered in the same range
as At
+
/At
N
/At
. The value indicates the number
of respondents, and the numbers in parentheses indi-
cate the percentage of all trials. The column “recall”
shows the results of the recall test. The value in the
“recall” column indicates the number of people who
remembered the corresponding characteristic. The
values in parentheses indicate the percentage of those
who remembered the content of the short sentences
whose subjective impression rating matched the char-
acteristic assigned to the sentence stimuli. The col-
umn “closed” shows the results of those whose pupil-
lary response tended toward miosis. The value in
the “closed” column indicates the number of subjects
whose pupillary response showed miosis tendency,
and the number in parentheses indicates the percent-
age of the total trials.
Figure 3 shows the frequency distribution of the
pupil response and subjective evaluation of the stim-
uli. These were generated using data from 658 trials,
excluding data for pupillary response with a pupil di-
ameter acquisition rate of less than 70% per trial and
an outlier rate of 30% or more. The histograms for
miosis are shown in the upper row of Figure 3, and
the histograms for impression assessment are shown
in the lower row. The columns of the graph are di-
vided by the EIW characteristics included in the sen-
tence stimuli and are plotted against V
++
V
N
V
−−
from left to right. The histograms in red and blue in
the figure represent Ar
H
and Ar
L
, respectively.
Can Pupillary Responses while Listening to Short Sentences Containing Emotion Induction Words Explain the Effects on Sentence
Memory?
217
Table 3: p-values of r
mio
, r
myd
, and r
all
. 0.05, 0.1, and
effective in column name represents categories for sentence
sampling. The parameter combination represents left and
right side of comma.
miosis mydriasis total change
p-value 0.05 0.1 effective 0.05 0.05 effective
V
++
, V
−−
0.0030 - 0.0189 - 0.0526 -
V
++
, V
N
0.0057 - - 0.0282 - -
V
−−
, V
N
- 0.097 - - - -
V
++
Ar
H
, V
N
Ar
H
0.0038 - 0.0044 - - -
V
++
Ar
L
, V
N
Ar
H
0.0315 - 0.0626 0.0369 - -
V
++
Ar
L
, V
N
Ar
L
- - - 0.0792 0.0173 -
V
−−
Ar
H
, V
N
Ar
H
0.0240 - 0.0055 0.0553 - -
V
−−
Ar
H
, V
N
Ar
L
0.0201 - 0.0105 - - -
V
−−
Ar
L
, V
N
Ar
L
- - - - - 0.0565
V
N
Ar
H
, V
N
Ar
L
- - 0.0937 - 0.0068 -
4.1 Preliminary Analysis
First, we explain the results shown in Table 2. The
recall test was conducted orally, and the participants
were asked to report what they remembered in the
short sentences presented as stimuli, using as many
words and phrases as possible. If the reported con-
tent approximated the presented sentence, the sen-
tence was judged to have been memorized. Six par-
ticipants reported one sentence, 10 reported two, two
reported four, and none reported three sentences.
The relationship between V
i
, Ar
i
, At
j
and recall
was as follows. The recall rate was highest for short
sentences containing EIWs characterized by V
−−
Ar
H
, regardless of the atmosphere of the short sen-
tence. Short sentences characterized by V
++
Ar
L
At
also had reasonably high recall rates. These
findings support the results of a previous study (Mu-
rakami et al., 2021).
The relationship between pupillary response and
recall was as follows. The relationship between the
valence of EIWs and the mydriasis rate was 40 54%
for V
++
V
−−
and 28 40% for V
N
. This result indi-
cates that mydriasis is more likely to occur in case of
V
++
V
−−
. In terms of miosis, V
++
in Table 2 shows
a strong trend toward miosis in V
++
Ar
H
At
+
. Of
the 15 participants who were able to recall short sen-
tences with EIWs of V
−−
Ar
H
in the recall test, a
total of 7 participants had a pupillary response of a
trend toward miosis.
Next, we explain the results shown by Figure 3.
Looking at the density distribution for miosis (upper
row of the figure), the values ranged from 0.3 to
0.2, However, there is a slight difference in the dis-
tribution of pupil response between Ar
H
and Ar
L
. Es-
pecially for V
++
, the overall miosis value was larger
than 0.5. But for V
−−
, compared to Ar
H
or Ar
L
shows a slight advantage in the area where it takes
values less than 0.5. Furthermore, the density distri-
bution of subjective impression assessment for short
sentences (lower row of the figure) shows a large dif-
ference between Ar
H
At
and Ar
L
At
, especially
Table 4: Average of r
mio
, r
myd
, and r
all
. 0.05, 0.1, and
effective in column name represents categories for sentence
sampling. Parameter representation is same as Table 3.
pupil response miosis mydriasis total change
0.05 0.1 effective 0.05 0.05 effective
V
++
-0.2869 - -0.3236 0.4012 - -
V
N
-0.3937 -0.3673 -0.3773 0.3302 0.6826 -
V
−−
-0.3932 -0.3176 - - 0.7768 -
V
++
Ar
H
-0.279 - -0.311 - - -
V
++
Ar
L
-0.305 - -0.340 0.413 0.762 -
V
−−
Ar
H
-0.298 - -0.310 0.400 - -
V
−−
Ar
L
-0.422 - -0.400 - 0.777 0.734
V
N
Ar
H
-0.438 - -0.412 0.295 0.748 -
V
N
Ar
L
- - -0.354 0.341 0.608 0.658
for V
−−
. Compared to Ar
L
, the subjective impres-
sion assessment of Ar
H
is split into positive/negative,
whereas the negative impression of Ar
H
is expected
to be negative. The evaluation of sentences with EIW
for V
N
generally shows a positive value.
4.2 Valence–Arousal of Sentences and
Pupil Diameter Change
We computed r
mio
, r
myd
, and r
all
for each participant
and each sentence using the method described in Sec-
tion 2 and analyzed their behavior toward emotional
stimuli. The sensitivity of the sentences to r
mio
, r
myd
,
and r
all
is unknown. We performed a t-test across
all 40 sentences for each pupillary response, and an-
alyzed the emotional stimuli with p < 0.1. The num-
ber of sentences sampled for r
mio
, r
myd
, and r
all
were
N
s,mio
= 24, N
s,myd
= 27, and N
s,all
= 24, respec-
tively. We represent the sampling sentence groups by
G
s,mio
, G
s,myd
, and G
s,all
, respectively. Each group
comprised sentences extracted at three levels: p <
0.05, 0.05 < p < 0.1, p < 0.1. We represent the
attribute of each group by G
s,mio/myd/all,pvalue
. For
example, G
s,mio,0.05
was analyzed for r
mio
using the
group of 24 sentences in N
s,mio
that satisfied p < 0.05.
Each emotional stimulus contained EIWs with
specific V
i
,Ar
i
. Therefore, if r
mio
, r
myd
, and r
all
have
some relationship with EIWs, they should have some
characteristics for valence or arousal. Therefore, we
extracted r
mio
for G
s,mio
, r
myd
for G
s,myd
, and r
all
for
G
s,all
, and calculated the (V
++
/V
N
/V
−−
). (Ar
H
/Ar
L
)
of the sentences were combined to perform the t-test.
Table 3 shows the results of the tests for the com-
bined EIW characteristics that were significantly dif-
ferent or tended to be significant. Several trends were
observed among the valences. For r
mio
, significant
differences or significant trends were found for all
combinations of V
++
/V
−−
, V
++
/V
N
, and V
−−
/V
N
.
On the other hand, r
myd
showed significant differ-
ences only among V
++
/V
N
, while r
all
showed no sig-
nificant differences or trends. This indicates that mio-
sis or mydriasis is suitable for valence state identifica-
tion. No significant difference or trend was found for
HUCAPP 2023 - 7th International Conference on Human Computer Interaction Theory and Applications
218
H
N
H+
v
At
assessment of impression
arousal
HL
H
N
H
v-
At+
HL
v+
At-
v-
At-
arousal
Figure 4: Valence, arousal, atmosphere, assessment of im-
pression. V
+
/ V
represent high/low valence, At
+
/ At
represent positive/negative atmosphere.
arousal. The same is true for the results of the t-test
for the valence/arousal combination in the table.
Table 4 shows the average of r
mio
/r
myd
/r
all
for
the conditions with significant differences or signif-
icant trends in Table 3. The trend for miosis against
valence was generally negative with a large V
N
r
mio
and a small V
++
/V
N
. This indicates that the re-
sponse of miosis is weak for V
++
/V
−−
. On the other
hand, V
++
shows a larger positive value for mydria-
sis. The trend of miosis for the valence/arousal com-
bination V
++
Ar
H
, V
−−
Ar
H
shows small negative
values. This indicates that the response of miosis is
weak for Ar
H
(V
++
V
−−
). For V
−−
Ar
H
and
V
++
Ar
L
, mydriasis showed large positive values.
This indicates that the response of mydriasis is strong
for V
++
Ar
H
or V
−−
Ar
L
. In summary, the common
trend was miosis/mydriasis response to V
−−
Ar
H
.
5 DISCUSSION
As mentioned in section 2, many studies have in-
dicated a certain relationship between pupillary re-
sponse and emotion. Lavoie and O’Connor sug-
gest that, for memory and emotion, emotional events
are associated with a better recall than that associ-
ated with non-emotional events, and that ERPs re-
flecting temporal conscious recall and post-retrieval
monitoring are clearly affected by both valence and
arousal (Lavoie and O’Connor, 2013). Gomes et
al. used a dual process model to investigate the ef-
fects of valence and arousal on memory, suggest-
ing that valence and arousal affect memory in rela-
tion to each other but not in isolation (Gomes et al.,
2012). Megalakaki et al. conducted a remem-
ber/known test on text comprehension and memoriza-
tion in which valence and emotional intensity were in-
dividually manipulated. The results suggest that emo-
tion plays an important role in memorization as posi-
tive/negative content is more easily recalled than neu-
tral content and that positive words are more easily
recalled (Megalakaki et al., 2019).
Based on the results of the experiment, the pupil-
lary response and its effect on sentence memory when
a short auditory stimulus containing EIW is perceived
may be attributable to the following:
Relationship between valence, arousal, pupillary
response, and memory, i.e., the features of EIW
Relationship between the EIW feature values,
atmospherics, and subjective impression assess-
ment of the auditory stimulus.
5.1 Relation Between Memory and
Pupil Response for EIWs
The results presented in Table 3 suggest that V
−−
Ar
H
and V
++
Ar
L
can be separated from V
N
. Ta-
ble 4 shows that ¯r
mio
for V
−−
Ar
H
and V
++
Ar
L
is 0.279 and 0.298 respectively, showing a lower
miosis tendency than did other states. The ¯r
mio
for
V
++
Ar
L
was 0.305, again showing a low miosis
tendency. Conversely, ¯r
myd
for V
−−
Ar
H
and V
++
Ar
L
is 0.413 and 0.400, which is higher than that in
other states. This shows a good correspondence with
the high memory for V
−−
Ar
H
and V
++
Ar
L
sen-
tences in Table 2. These results suggest that valence
and arousal have a certain influence on human mem-
ory, and that sentences containing EIWs with emo-
tions expressed by V
−−
Ar
H
and V
++
Ar
L
are eas-
ily remembered. This also suggests that measuring
the pupillary response of people who listen to audi-
tory stimuli can be used to estimate the likelihood of
emotion felt on perceiving auditory stimuli, and thus
the degree to which they are likely to remember them.
5.2 EIW Features, Atmospheres, and
Subjective Impression Assessment
This section discusses which valence, arousal, and at-
mosphere included in the auditory stimulus is most
likely to influence the subjective impression evalua-
tion. Figure 4 shows the heatmap of two-dimensional
frequency distribution related to arousal derived from
EIWs in short sentences and the score of the subjec-
tive assessment of the impression for short sentence
as auditory stimuli. Each short sentences including
EIWs characterized by V
i
, Ar
i
, and At
j
. The hori-
zontal axis of each graph represents arousal, which
changes from low to high arousal from left to right.
The vertical axis represents the assessment of impres-
sion, which changes from positive N negative
Can Pupillary Responses while Listening to Short Sentences Containing Emotion Induction Words Explain the Effects on Sentence
Memory?
219
from the bottom to the top of the axis. The graph
comprehension is as follows: the lower right region
of the heatmap shows Ar
H
and positive assessment
of impression, and the upper right region shows Ar
H
and negative assessment of impression. As shown in
Table 1, the number of auditory stimuli classified by
(V
i
, Ar
i
, At
j
) is almost equal. Therefore, the antici-
pated distribution of subjective impression results in
Figure 4 comprises approximately equivalent peaks
for each feature of auditory stimulus. For exam-
ple, (V
++
, At
+
) has two strong distributions around
(Ar
i
, S
ai
) (Ar
L
Ar
H
, 1 2) and is expected to
have more strongly positive (here S
ai
1) results for
Ar
H
than for Ar
L
.
Overall, these results suggest that the short sen-
tence impression expected from the combination
of (V
i
, Ar
i
, At
j
) and the participant’s subjective
impression assessment are generally consistent for
short sentences. However, short sentences with the
(V
++
, Ar
H
, At
+
) feature require more work. This in-
dicates that, at least for EIW, it is sufficient to check
V
i
and Ar
i
and incorporate them into short sentences.
We will investigate the relationship between At
j
and
pupillary response to understand the relationship be-
tween subjective impression assessment of the short
sentence and pupillary response. Further, this sug-
gests that the relationship with memory can be re-
ferred to in the future.
6 CONCLUSION
The purpose of this study was to detect relationship
between the response of individual participants to the
EIW characterized by valence and arousal, and their
memory of short sentences. We designed a short sen-
tence containing an EIW as an auditory stimulus to
facilitate measurement of the pupil dilation response,
based on the idea that the EIW could be characterized
by valence and arousal and related to emotion. Partic-
ipants were then presented with auditory stimuli, and
their impressions of the auditory stimuli, pupil dila-
tion response, and remembered sentences were mea-
sured. The results suggest that the r
myd
/ r
mio
of au-
ditory stimuli, such as narration, can be easily memo-
rized by measuring the content of the auditory stimuli.
Based on this finding, it will be possible in the future
to create and present narration-like commentaries tai-
lored to individual characteristics.
ACKNOWLEDGEMENTS
This work was partly supported by JSPS KAKENHI
Grant Number 19K12232, 19K12246, 20H04290,
and 22K12284. MH also wants to thank to Nagai
N · S Promotion Foundation For Science of Percep-
tion for their finantial support.
REFERENCES
Bradley, M. M. and Lang, P. J. (1994). Measuring emotion:
The self-assessment manikin and the semantic differ-
ential. Journal of Behavior Therapy and Experimental
Psychiatry, 25(1):49–59.
Bradley, M. M. and Lang, P. J. (1999). Affective Norms
for English Words (ANEW): Instruction Manual and
Affective Ratings.
Gomes, C., Brainerd, C., and Stein, L. (2012). Effects
of emotional valence and arousal on recollective and
nonrecollective recall. Journal of experimental psy-
chology. Learning, memory, and cognition, 39:663–
677.
Hirabayashi, R., Shino, M., Nakahira, K. T., and Kita-
jima, M. (2020). How auditory information presen-
tation timings affect memory when watching omnidi-
rectional movie with audio guide. In Proceedings of
the 15th International Joint Conference on Computer
Vision, Imaging and Computer Graphics Theory and
Applications - Volume 2: HUCAPP, pages 162–169.
INSTICC, SciTePress.
Honma, Y. (2014). Drawing up of the japanese word stimu-
lus based on the emotional valence and arousal of the
word. Bulletin of Aichi Institute of Technology, 49:13–
24.
Lavoie, M. and O’Connor, K. (2013). Effect of emo-
tional valence on episodic memory stages as indexed
by event-related potentials. World Journal of Neuro-
science, 03:250–262.
Megalakaki, O., Ballenghein, U., and Baccino, T. (2019).
Effects of valence and emotional intensity on the com-
prehension and memorization of texts. Frontiers in
Psychology, 10.
Murakami, M., Shino, M., Nakahira, K., and Kitajima, M.
(2021). Effects of emotion-induction words on mem-
ory of viewing visual stimuli with audio guide. In Pro-
ceedings of the 16th International Joint Conference
on Computer Vision, Imaging and Computer Graph-
ics Theory and Applications - Volume 1: HUCAPP,
pages 89–100. INSTICC, SciTePress.
Russell, J. (1980). A circumplex model of affect. Journal
of Personality and Social Psychology, 39:1161–1178.
Zekveld, A. A., Koelewijn, T., and Kramer, S. E. (2018).
The pupil dilation response to auditory stimuli:
Current state of knowledge. Trends in Hearing,
22:2331216518777174. PMID: 30249172.
HUCAPP 2023 - 7th International Conference on Human Computer Interaction Theory and Applications
220