An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records
Eva K. Lee, Yuanbo Wang, Yuntian He, Brent M. Egan
2019
Abstract
Electronic Health Records (EHR) containing large amounts of patient data present both opportunities and challenges to industry, policy makers, and researchers. These data, when extracted and analyzed effectively, can reveal critical factors that can improve clinical practices and decisions. However, the inherently complex, heterogeneous and rapidly evolving nature of these data make them extremely difficult to analyze effectively. In addition, Protected Health Information (PHI) containing sensitive yet valuable information for clinical research must first be anonymized. In this paper we identify current challenges with obtaining and pre-processing information from EHR. We then present a comprehensive, efficient “pipeline” for extracting, de-identifying, and standardizing EHR data. We demonstrate the use of this pipeline, based on software from EPIC Systems, in analysing chronic kidney disease, prostate cancer, and cardiovascular disease. We also address challenges associated with temporal laboratory time series data and natural text data and develop a novel approach for clustering irregular Multivariate Time Series (MTS). The pipeline organizes data into a structured, machine-readable format which can be effectively applied in clinical research studies to optimize processes, personalize care, and improve quality, and outcomes.
DownloadPaper Citation
in Harvard Style
Lee E., Wang Y., He Y. and Egan B. (2019). An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR; ISBN 978-989-758-382-7, SciTePress, pages 310-321. DOI: 10.5220/0008071303100321
in Bibtex Style
@conference{kdir19,
author={Eva K. Lee and Yuanbo Wang and Yuntian He and Brent M. Egan},
title={An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records},
booktitle={Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR},
year={2019},
pages={310-321},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0008071303100321},
isbn={978-989-758-382-7},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2019) - Volume 1: KDIR
TI - An Efficient, Robust, and Customizable Information Extraction and Pre-processing Pipeline for Electronic Health Records
SN - 978-989-758-382-7
AU - Lee E.
AU - Wang Y.
AU - He Y.
AU - Egan B.
PY - 2019
SP - 310
EP - 321
DO - 10.5220/0008071303100321
PB - SciTePress