3 LEVERAGING INFORMATION
AND KNOWLEDGE
MANAGEMENT
3.1 Improvement Opportunities
Optimizing the document workflow in hospitals by
converting legacy documents into a digital version,
or unstructured into structured information to feed
databases is the very first step to unleash the power
of advanced services building on this information. It
opens the door to a new set of applications that can
provide the assistance of knowledge services thanks
to automated data mining, and information analysis
tools. Risk assessment is one example.
A project has been started in 2009 to develop an
Hospital Acquired Infection (HAI) detector
monitoring patient related reports produced inside
hospitals (ALADIN-DTH). This project is funded in
part by the French Research Agency (ANR) for 3
years. It is a unique collaboration between key
partners with unique competencies in all aspect
required to build such a system: an University
Laboratory specialized into building multi-
terminologies resources for the medical domain, a
content French provider for Pharmaceutical
information, a research Center specialized into
designing Natural Language Processing and
Semantic Management systems and 4 university
hospitals providing real data and HAI expertise.
3.2 Key Technology behind this Project
The heart of this project is the Xerox Incremental
Parser (XIP), which performs text mining. This
parser is robust that is to say it has already been used
in various projects to process large collections of
unrestricted documents. It has been designed to
follow strict incremental strategies when applying
parsing rules. The system never backtracks on rules
to avoid falling into combinational explosion traps
which makes it very appropriate to parse real long
sentences from scientific texts for example (Aït-
Mokhtar 1997).
We have decided to use such a tool as HAI is a
complex issue that implies for instant pieces of
evidences appearing according to a strict chronology
(e.g. a patient has a surgery, then 2 days latter some
symptoms occur such as fever, then some specific
antibiotics are prescribed, etc. ). To establish these
connections we need to have a certain level of
understanding of the content of the document,
simple keyword detection is not enough.
3.3 Addressing the Terminology Issue
To be able to detect pertinent information from text
it is crucial to be able to address all terminologies in
use in targeted hospitals. This is required to build
appropriate lexicons that encompass pertinent terms
characterizing an HAI or serving as pieces of
evidence to conclude to an HAI suspicious case. One
of the partners (CISMeF) provides this information:
the Multi Terminologies Indexer is a generic
automatic indexing tool able to tag an entire
document with all terminologies necessary for the
project. This server offer term identification
covering the following terminologies: SNOMED 3.5
(International Systemized Nomenclature of human
and veterinary MEDicine), MeSH (Medical Subject
Heading), ICD10 (Classification of Diseases) and
CCAM (French CPT), TUV (Unified Thesaurus of
Vidal), ATC (Anatomical Therapeutic and Chemical
Classification), drug names with international non-
proprietary names (INN) and brand names, Orphanet
terms (rare diseases), CIF (International Functional
Terminology), CISP2 (International Classification
for Primary care), DRC (Consultation results),
MedlinePlus.
3.4 Temporal Sequence Detections
In this project the chronology of events is crucial to
characterize a problem. Relying on an existing
temporal processing system (Hagege 2008) we
perform an adaptation of this system for French in
order to be able to detect and normalize temporal
expressions appearing in text. Three kinds of
temporal expressions are considered:
1) Absolute dates (for example 10/03/2010)
2) Referential dates with reference to the utterance
time. (for example “yesterday”)
3) Referential dates whose reference is another
textual expression (for example, “two days after
admission”).
Discovering and normalizing temporal
expressions enable us to associate a time stamp to
the event described in text. As a result, we can
associate to the description of a potential risk factor
for HAI a time stamp and check if it occurs after or
before another specific event.
3.5 Temporal Sequence Detections
Risk indicators consist in the description of infection
events such as mentioning a specific bacteria,
antibiotic, symptoms such as fever etc. All the
SMART DOCUMENT TECHNOLOGIES FOR EXTRACTING AND STRUCTURING DATA FROM PATIENT
RECORDS - Opportunities for New Knowledge based Services
635