In the context of lifelong learning, the knowledge
that is acquired while appreciating exhibits in mu-
seum visit has to be remembered. Hirabayashi et
al. took omnidirectional movie as an example of ex-
hibition that utilizes the modern advanced technolo-
gies and examined the effect of auditory information
presentation timings on memory (Hirabayashi et al.,
2020). Their idea was that a stronger memory would
result if an auditory information for a particular ob-
ject is provided after having the viewer attend to it,
which is projected on the surface of the dome. For
directing the viewer’s eyeballs to the object to be ex-
plained, they used the appearance of the object to be
announced as a part of the audio guide (appearance in-
formation). The information for explaining the object
(contents information), which is not directly accessi-
ble from the appearance of the object, was provided
as audio guide after the appearance information us-
ing a variety of intervals. They found experimentally
that the intervals of 2 ∼ 3 seconds was most effec-
tive for creating memory of the object. They argued
that, in the best presentation interval condition, the vi-
sual process and the auditory process that are carried
out for comprehending the object should have jointly
activated part of long-term memory to generate most
richly connected network.
This paper extends Hirabayashi et al.’s study by
shifting the focus the research from the richness of
the network connection to the contents of the richly
connected network. The idea is to strengthen the con-
stituent nodes by manipulating the words used in the
audio guide while maintaining the topology of the
network generated by processing visual and auditory
information by using the best interval of 2 ∼ 3 sec-
onds between the timings of provision of appearance
information and content information. Namely, on the
assumption that generation of richly connected mem-
ory network is assured by the best interval condition,
this paper investigates the possibility of making the
network stronger in terms of the total amount of ac-
tivation the network holds by manipulating the con-
crete words used in the contents information.
For this purpose, this paper utilizes the finding that
emotion enhances episodic memory by strengthening
the constituent nodes. Deborah et al. (Talmi et al.,
2019) proposed an extension of the Context Main-
tenance and Retrieval Model (CMR) (Polyn et al.,
2009), eCMR, to explain the way people may repre-
sent and process emotional information. The eCMR
model assumes that a word associated with emo-
tion, e.g., spider, is encoded with its emotional state
in working memory (they called it “context layer”)
and the presented emotional word establishes stronger
link than neutral words. This paper realizes the same
effect operationally by attaching a larger strength to
the emotional node in the network and examines
the effect of emotion-induction words used in audio
guide on memory of movie viewing experience. It
is likely that viewers’ reaction to emotion-induction
words may reflect their personal experience or knowl-
edge. Therefore, this paper also utilizes the finding
that pupil dilation reflects the time course of emotion
recognition (Oliva and Pupil, 2018; Henderson et al.,
2018; Partala and Surakka, 2003) to gather the evi-
dence that the manipulation of emotion induction is
successful.
This paper is organized as follows. Section 2 de-
scribes cognitive framework that shows the effect of
timings and contents of audio guide on memory. Sec-
tion 3 describes three experiments that investigate the
effect of emotion-induction words on memory. The
first one is for examining the participants response to
emotion-induction words, the second one is for sen-
tences with emotion-induction words, and the last one
is to measuring memory for movie with positive, neg-
ative, and neutral emotion-induction words. The fol-
lowing sections, Section 4 and Section 5, provide the
results of the experiments and discusses them from
the viewpoint of pupil diameter changes.
2 INTEGRATION OF VISUAL
INFORMATION AND
AUDITORY INFORMATION
Hirabayashi et al. studied the importance of
timing of providing auditory information while
watching movies to make the experience memo-
rable (Hirabayashi et al., 2020). This paper extends
their findings, i.e., effective provision timing of audi-
tory information for memory formation, by focusing
on the effect of emotion-induction words in the au-
ditory information. This section outlines Hirabayashi
et al.’s model that explains the effective timing audi-
tory information while processing visual information
along with necessary modifications for dealing with
the effect of emotion-induction words in auditory in-
formation. Starting from the introduction of cognitive
model of memory formation, this section discusses
why it is essential to take timings into account.
2.1 Memory Formation by Integrating
Visual and Auditory Information
Figure 1 illustrates the perceptual and cognitive pro-
cesses while acquiring visual information with the
support of an audio guide, incorporating the find-
HUCAPP 2021 - 5th International Conference on Human Computer Interaction Theory and Applications
90