Authors:
Eva K. Lee
1
;
Yuanbo Wang
1
;
Yuntian He
1
and
Brent M. Egan
2
Affiliations:
1
Center for Operations Research in Medicine and HealthCare, U.S.A., H. Milton Stewart School of Industrial and Systems Engineering, U.S.A., School of Biological Sciences, Georgia Institute of Technology and U.S.A.
;
2
University of South Carolina School of Medicine–Greenville, U.S.A., Care Coordination Institute, Greenville and U.S.A.
Keyword(s):
Electronic Health Record, Information Extraction, Encryption, Data Standardization, Clustering, Time Series.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
BioInformatics & Pattern Discovery
;
Clustering and Classification Methods
;
Computational Intelligence
;
Concept Mining
;
Evolutionary Computing
;
Foundations of Knowledge Discovery in Databases
;
Information Extraction
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Machine Learning
;
Mining Text and Semi-Structured Data
;
Pre-Processing and Post-Processing for Data Mining
;
Soft Computing
;
Symbolic Systems
Abstract:
Electronic Health Records (EHR) containing large amounts of patient data present both opportunities and challenges to industry, policy makers, and researchers. These data, when extracted and analyzed effectively, can reveal critical factors that can improve clinical practices and decisions. However, the inherently complex, heterogeneous and rapidly evolving nature of these data make them extremely difficult to analyze effectively. In addition, Protected Health Information (PHI) containing sensitive yet valuable information for clinical research must first be anonymized. In this paper we identify current challenges with obtaining and pre-processing information from EHR. We then present a comprehensive, efficient “pipeline” for extracting, de-identifying, and standardizing EHR data. We demonstrate the use of this pipeline, based on software from EPIC Systems, in analysing chronic kidney disease, prostate cancer, and cardiovascular disease. We also address challenges associated with tem
poral laboratory time series data and natural text data and develop a novel approach for clustering irregular Multivariate Time Series (MTS). The pipeline organizes data into a structured, machine-readable format which can be effectively applied in clinical research studies to optimize processes, personalize care, and improve quality, and outcomes.
(More)