Modelling Cognitive Workload to Build Multimodal Voice
Interaction in the Car
Sylvia Bhattacharya
1a
and J. Stephen Higgins
2
1
Department of Engineering Technology, Kennesaw State University, Marietta, Georgia, U.S.A.
2
UX Research, Google. Inc, Mountain view, California, U.S.A
Keywords: In-Vehicle Information Systems, Cognitive Demands, Optimization, Visual/Auditory Modalities, Tactile
Screen Tapping, Voice Commands, Visual Cues, Cognitive Load.
Abstract: The paper discusses the integration of in-vehicle information systems and their impact on driver performance,
considering the demands of various types such as visual, auditory, manual, and cognitive. It notes that while
there's a lot of research on optimizing visual and manual systems, less attention has been paid to systems that
use both visual and auditory cues or a combination of different types. The study has found that simple tasks
cause the least cognitive strain when drivers use touchscreens, while complex tasks are easier to manage
cognitively when voice commands are used alone or with visual aids. These results are important for designing
car interfaces that effectively manage the driver's cognitive load.
1 INTRODUCTION
The latest World Health Organization (WHO) report
underscores the grave toll of road traffic injuries,
indicating that in 2013 alone, 1.25 million lives were
lost globally, positioning such injuries as a primary
global cause of mortality (WHO, 2016). Currently
ranking as the ninth major cause of death across age
groups worldwide, road traffic injuries are projected
to ascend to the seventh rank by 2030. Focusing on
driver distraction, numerous research endeavors have
highlighted its significance, attributing between 25%
and 75% of all accidents to distraction and inattention
(Dingus et al., 2006; Ranney et al., 2000; Klauer et
al., 2005; Klauer et al., 2006a; Talbot and Fagerlind,
2006). The escalating adoption of in-vehicle
information systems is of paramount importance due
to their potential to elicit visual, auditory, manual, and
cognitive demands, potentially impacting driving
performance in diverse ways. A critical knowledge
gap pertains to the intricate interplay between various
distraction types and interaction methods during
driving. This pivotal context underscores the
necessity of a comprehensive understanding to
inform safer and more effective driving
environments.
a
https://orcid.org/0000-0002-5525-7677
Advancements in technology are expanding the
capabilities of infotainment systems introduced into
vehicles. Communication (e.g., Messaging) and other
important user journeys that have traditionally not
been available to the driver can now be embedded
within in-vehicle information systems (IVIS). Many
of these systems have the potential to increase safety,
advancements and open possibilities in the car that
haven’t been available in the past (Klauer et al.,
2006b). However, this must be done carefully
especially since many new interaction methods (e.g.,
Voice) do not have a long history of safety evaluation.
All non-driving interactions in a car involve
distraction (Victor, 2010). So, the goal of an
infotainment system should be supporting basic tasks
and short interactions. Complicated tasks should be
performed when the car is stopped. Multimodal
interaction has the potential to provide flexibility to
the user preferred and user safe modality. Google
automotive teams use industry standards and internal
research to determine how to create products most
safely and effectively to be used while driving. There
is substantial safety data indicating how to build
visual/ manual-based products, but there is not
enough data indicating how we should build
multimodal visual/voice/ manual systems. We hope