routine. The experiments were conducted over
multiple days, usually right after morning rounds.
Physicians called from the operating room, after
the patient had been anesthetized, and from various
locations in the ICU, including hallways, meeting
rooms, patient rooms and nurses’ stations; calls
usually shortly after physicians examined their
patients.
We compared two different modes of interaction
with the physician users. The first was system-
driven; the system provided the list of patient
problems using inference engine output and the
physician simply confirmed problems that s/he
believed were associated with a patient. The second
one was user-driven and allowed physicians to
identify the problems they thought were relevant to
the patient. Our system-driven approach requires the
user to pay close attention to what the system
proposes. In our user-driven system the user is free
to enter information as s/he pleases. (Ackermann
and Libosse) observed that systems which require
more usage of human memory were more prone to
errors and took longer to complete.
In the system-driven mode, once the physician
had entered the patient’s RefN, s/he was presented
with a list of problems associated with that patient
and asked to say yes to any that should be included
in the note. At the end of this process, the physician
was asked if s/he would like to augment the list with
additional problems which may not have been
deduced by the inference engine.
The user-driven mode allowed the caller to first
speak all the problems for a given patient. The
system would then compare the collected problems
with those produced by the inference engine (usually
a larger set) and would ask the caller if she would
like the system to present the remaining inferred
problems for inclusion in the patient’s profile. If the
caller responded with yes, the system would then list
the remaining problems one at a time; the caller
would say yes to include a problem in the profile. It
was possible to skip this process by saying done at
any point during the listing.
Since it relies on yes-no recognition, the system-
driven mode should have the advantage of high
accuracy in recognizing problems and should expose
users to problems they might not have thought of.
The user-driven mode should give users more
control over the direction of the dialog, allowing
them to enter the problems they felt were more
important first, and hopefully reducing the cognitive
overhead.
After collecting the speech, we also
experimented with using different combinations of
grammars and language models to recognize the
input. We experimented with the ICD9 grammar and
the ICD9 plus patient-specific grammar. When used
alone the patient specific grammar yielded poor
results. This is because physicians would often
express problems using a more varied vocabulary
and order than encoded in the grammar. UMLS has
been shown to provide useful strings for natural
language processing when properly selected
(McCray, et al.) We experimented with a larger
grammar constructed from the UMLS (16391
Entries) comparing recognition accuracy with the
UMLS grammar alone, the UMLS plus the ICD9
grammar and the UMLS plus both the ICD9 and the
patient specific grammar. We also experimented
with various combinations of language models. Our
language models were trained on data from various
sources including: the UMLS database of disease
descriptions; anonymized discharge notes and
transcriptions of medical interviews. The model
trained on all the data sets combined gave the best
performance in terms of WER. See Table 1
LM+ICD9+Patient Specific for more details. There
were 389k sentences and 49k words used to train our
tri-gram language model.
After using one of the versions of the system, the
users were asked to complete a survey about their
experience with the system. They were asked to
answer four questions, using a scale from one to
five, where one generally meant a negative response
and five a very positive one. They were allowed to
speak or type their answers using the telephone’s
touchpad. The questions were:
Q1 Would you find this system helpful for collecting
patient information?
Q2 Does the system ask questions efficiently?
Q3 Was the system knowledgeable about your patient?
Q4 Would you want to use this system to retrieve
information about your patients?
5 RESULTS
During our experiment we received 44 calls from
both physicians and students. The students were
given a script with made-up patient information. The
physicians called in from the CTICU and were asked
to enter information about their current patients. The
average number of turns per call was 18, where each
turn is an interaction between the system and the
user. The average call duration was 3 minutes and
42 seconds, and the longest call lasted almost 24
minutes.
HEALTHINF 2009 - International Conference on Health Informatics
326