minimum number of participants for human factors
validation testing (Health, 2019). The tests were
carried out in a simulated-use environment to ensure
adequate observation. Additionally, to ensure patient
privacy, we created adequate simulated patients
profiles according to the mode principle (A. Ravizza
et al., 2020). By doing so, patient privacy was ensured
while we also allowed the test participants to interact
with realistic data. The mode principle allows
describing simulated patients using the data that are
most frequent in the patient population, which are
considered more representative than mean values
because the latter can be inconsistent with real data
(A. Ravizza et al., 2020). The test scenario was
designed by referring to the task analysis conducted
in the formative evaluation.
The task list, that is the main script for the user
tests, was implemented to test all of the primary
functions. Thus, within the same scenario, the user
might be asked to do multiple tasks per feature (e.g.,
inserting a new patient in the EHR by searching her
from a contact list or by inserting the personal
information in a search bar). By allowing the presence
of tasks sharing similar sub-steps, the test participant
had the option to understand the navigation pathway
better and conclusively give an informed opinion on
the interface characteristics based on multiple
interactions rather than a single one.
At the beginning of the user test, due to the
complex interface of the medical device, we invited
the device expert to conduct an introductory speech
and a brief training session to give proper information
about the intended use of the device and the purpose
of the user test. The speech aims to ease the
participants into the experience, by providing them
with a basic introduction on what they will later see.
More importantly, the speech helps them focus on the
crucial aspect of their contribution, which is to report
what they perceive, what they reason about, and
which action they will take accordingly. This
decomposition allowed the interviewer, during the
test, to assess the level of individual user interaction
with the specific task. Moreover, by using the PCA
approach (International Electrotechnical
Commission, 2016), the interviewer was able to
identify the main categories of use errors which
stemmed from perception, cognition and action
errors. Besides, potential use problems can be
targeted by asking the user about the consequences of
a failed task. As prescribed by the standard, we
trained the moderators to not intervene during the
user test, but to limit the intervention only if the user
could not complete the task autonomously.
The participants tested the functionalities of the
devices following the task list with the supervision of
the moderators and, for each user task use, the
moderators evaluated the user actions with the
following policy:
ok: the task was completed without error
ue (user error): the user was not able to
complete the task and requires help from the
moderator, or the user made an error that had
no impact on the patient (e.g., the input of the
password with the caps lock on), or the user
knowingly neglected to complete a task
ce (critical error): the user made an error that
has an impact on clinical risk. e.g., ignored a
notification regarding critical clinical risk (for
example, drug interaction); skipped the patient
identification.
te (technical error): task not completed due
system failure.
The purpose of analysing the task performed by
users was to evaluate the presence of ce (critical error)
and to identify which task may have caused
uncertainty or confusion. The result of the task
completion is an informative source of improvements
for technical manuals and training sessions, allowing
the designers to understand which require additional
clarity in the instructions and more examples during
training. Additionally, it can provide feedback on the
unsolved technical issues occurring during normal
use.
Additionally, during the user test, the participants
may comment on the device performance (in terms of
usability), and the moderators may propose open-
ended questions to the users, which may lead to
additional problems and uncertainty information and
further product improvement. We encourage
collecting the notes from the user impression; once
vetted, they can be a valuable source for further
product improvement.
At the end of the simulated use, we let users
evaluate the devices with three different metrics:
interview, heuristic questionnaire, and System
Usability Scale (SUS). The first two are the
techniques designed during the formative evaluation,
while the SUS questionnaire provides a "quick and
dirty," reliable tool for measuring usability. It consists
of a 10-item questionnaire with five response options
for respondents; from Strongly agree to strongly
disagree (Jordan et al., 1996).
The questionnaires were briefly described by the
moderators to the participants and then filled
autonomously by the participants.