Identifying an Autoinflammatory Syndrome Cohort Using Natural Language Processing with Electronic Medical Record Data

Maranda Russell, Aleksander Lenert, Katherine Liao, Tianrun Cai, Sujin Kim

2025

Abstract

Autoinflammatory syndromes (AIS) are rare inflammatory disorders with diverse and severe manifestations, making their clinical outcomes and phenotypes poorly understood. This study developed and validated machine learning algorithms incorporating clinical natural language processing (cNLP) and electronic medical record (EMR) data to identify AIS cases. Patients were filtered using relevant billing codes, medications, and ICD-9/-10 codes for conditions such as adult-onset Still’s disease, Behcet's disease, and familial Mediterranean fever. Machine learning models—adaptive lasso penalized logistic regression (ALASSO), support vector machine (SVM), and random forest (RF)—utilized structured codes and cNLP-extracted features. Of 206 patients screened, 61 (29.6%) were confirmed AIS cases after manual review. SVM (AUC=0.954) and RF (AUC=0.948) outperformed ALASSO (AUC=0.94). A total of 44 features, including ICD codes for arthritis and Behcet's disease and cNLP-derived concepts such as periodic fever, oral lesions, and colchicine treatment, were predictive of AIS. This study demonstrates the feasibility of combining structured and unstructured EMR data for AIS identification, providing a scalable framework for phenotyping rare diseases and advancing outcomes research.

Download


Paper Citation


in Harvard Style

Russell M., Lenert A., Liao K., Cai T. and Kim S. (2025). Identifying an Autoinflammatory Syndrome Cohort Using Natural Language Processing with Electronic Medical Record Data. In Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF; ISBN 978-989-758-731-3, SciTePress, pages 867-873. DOI: 10.5220/0013323500003911


in Bibtex Style

@conference{healthinf25,
author={Maranda Russell and Aleksander Lenert and Katherine Liao and Tianrun Cai and Sujin Kim},
title={Identifying an Autoinflammatory Syndrome Cohort Using Natural Language Processing with Electronic Medical Record Data},
booktitle={Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF},
year={2025},
pages={867-873},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013323500003911},
isbn={978-989-758-731-3},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF
TI - Identifying an Autoinflammatory Syndrome Cohort Using Natural Language Processing with Electronic Medical Record Data
SN - 978-989-758-731-3
AU - Russell M.
AU - Lenert A.
AU - Liao K.
AU - Cai T.
AU - Kim S.
PY - 2025
SP - 867
EP - 873
DO - 10.5220/0013323500003911
PB - SciTePress