Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV
Vitalijs Teze, Erika Nazaruka, Dmirtijs Bliznuks
2025
Abstract
This paper presents an information-theoretic framework to evaluate feature discriminative power and stability for patient record matching. We analyse the discriminative power and temporal stability of features through Shannon entropy, evaluating their effectiveness for patient identification without unique identifiers. Our framework categorizes features into demographics/administrative (𝐷(𝐹)=12247.56 bits), ICU care patterns (𝐷(𝐹)=266.40 bits), and clinical records (𝐷(𝐹)=12.10 bits), achieving a combined discriminative power of 12526.06 bits. This significantly exceeds the theoretical minimum threshold (logଶ(𝑁) ≈ 16 bits) for our population of 65,366 patients. The framework employs hierarchical feature weighting based on information content and stability coefficients, revealing that temporal patterns and service transitions contain higher discriminative power than traditional demographic identifiers. We demonstrate that effective matching requires balancing feature stability against information content while maintaining computational efficiency. The framework provides a foundation for implementing reliable patient matching systems, though further validation across diverse healthcare environments is needed.
DownloadPaper Citation
in Harvard Style
Teze V., Nazaruka E. and Bliznuks D. (2025). Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV. In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-742-9, SciTePress, pages 280-291. DOI: 10.5220/0013475400003928
in Bibtex Style
@conference{enase25,
author={Vitalijs Teze and Erika Nazaruka and Dmirtijs Bliznuks},
title={Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV},
booktitle={Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE},
year={2025},
pages={280-291},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013475400003928},
isbn={978-989-758-742-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE
TI - Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV
SN - 978-989-758-742-9
AU - Teze V.
AU - Nazaruka E.
AU - Bliznuks D.
PY - 2025
SP - 280
EP - 291
DO - 10.5220/0013475400003928
PB - SciTePress