EMBASSI (Hildebrand 2000) explores new
approaches for human-machine communication with
specific reference to consumer electronic devices at
home (TVs, VCRs, etc.), in cars (radio, CD player,
navigation system, etc.) and in public areas (ATMs,
ticket vending machines, etc.). Since it is much
easier to convey complex information via natural
language than by pushing buttons or selecting
menus, the EMBASSI project focuses on the
integration of multiple modalities like speech, haptic
deixis (pointing gestures), and GUI input and output.
Because EMBASSI’s output is destined for a wide
range of devices, the system considers the effects of
portraying the same information on these different
devices by utilising Cognitive Load Theory (CLT)
(Baddeley & Logie 1999). (Fink & Kobsa 2002)
discuss a system for personalising city tours with
user modelling. They describe a user modelling
server that offers services to personalised systems
with regard to the analysis of user actions, the
representation of the assumptions about the user, and
the inference of additional assumptions based on
domain knowledge and characteristics of similar
users. (Nemirovsky and Davenport 2002) describe a
wearable system called GuideShoes which uses
aesthetic forms of expression for direct information
delivery. GuideShoes utilises music as an
information medium and musical patterns as a
means for navigation in an open space, such as a
street.
3 COGNITIVE LOAD THEORY
Elting et al. (2002) explain the cognitive load theory
where two separate sub-systems for visual and
auditory memory work relatively independently. The
load can be reduced when both sub-systems are
active, compared to processing all information in a
single sub-system. Due to this reduced load, more
resources are available for processing the
information in more depth and thus for storing in
long-term memory. This theory however only holds
when the information presented in different
modalities is not redundant, otherwise the result is
an increased cognitive load. If however multiple
modalities are used, more memory traces should be
available (e.g. memory traces for the information
presented auditorially and visually) even though the
information is redundant, thus counteracting the
effect of the higher cognitive load. Elting et al.
investigated the effects of display size, device type
and style of Multimodal presentation on working
memory load, effectiveness for human information
processing and user acceptance. The aim of this
research was to discover how different physical
output devices affect the user’s way of working with
a presentation system, and to derive presentation
rules from this that adapt the output to the devices
the user is currently interacting with. They intended
to apply the results attained from the study in the
EMBASSI project where a large set of output
devices and system goals have to be dealt with by
the presentation planner. Accordingly, they used a
desktop PC, TV set with remote control and a PDA
as presentation devices, and investigated the impact
the multimodal output of each of the devices had on
the users. As a gauge, they used the recall
performance of the users on each device. The output
modality combinations for the three devices
consisted of
- plain graphical text output (T),
- text output with synthetic speech output of the
same text (TS),
- a picture together with speech output (PS),
- graphical text output with a picture of the
attraction (TP),
- graphical text, synthetic speech output, and a
picture in combination (TPS).
The results of their testing on PDAs are relevant to
any mobile multimodal presentation system that
aims to adapt the presentation to the cognitive
requirements of the device. The results show that in
the TV and PDA group the PS combination proved
to be the most efficient (in terms of recall) and
second most efficient for desktop PC. So pictures
plus speech appear to be a very convenient way to
convey information to the user on all three devices.
This result is theoretically supported by Baddeley’s
“Cognitive Load Theory” (Baddeley & Logie 1999,
Sweller et al. 1998), which states that PS is a very
efficient way to convey information by virtue of the
fact that the information is processed both
auditorally and visually but with a moderate
cognitive load. Another phenomenon that was
observed was that the decrease of recall performance
in time was especially significant in the PDA group.
This can be explained by the fact that the work on a
small PDA display resulted in a high cognitive load.
Due to this load, recall performance decreased
significantly over time. With respect to presentation
appeal, it was not the most efficient modality
combination that proved to be the most appealing
(PS) but a combination involving a rather high
cognitive load, namely TPS). The study showed that
cognitive overload is a serious issue in user interface
design, especially on small mobile devices. From
their testing Elting et al. discovered that when a
system wants to present data to the user that is
important to be remembered (e.g. a city tour) the
most effective presentation mode should be used
(Picture & Speech) which does not cognitively
ICETE 2004 - WIRELESS COMMUNICATION SYSTEMS AND NETWORKS
26