Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV

Vitalijs Teze, Erika Nazaruka, Dmirtijs Bliznuks

2025

Abstract

This paper presents an information-theoretic framework to evaluate feature discriminative power and stability for patient record matching. We analyse the discriminative power and temporal stability of features through Shannon entropy, evaluating their effectiveness for patient identification without unique identifiers. Our framework categorizes features into demographics/administrative (𝐷(𝐹)=12247.56 bits), ICU care patterns (𝐷(𝐹)=266.40 bits), and clinical records (𝐷(𝐹)=12.10 bits), achieving a combined discriminative power of 12526.06 bits. This significantly exceeds the theoretical minimum threshold (logଶ(𝑁) ≈ 16 bits) for our population of 65,366 patients. The framework employs hierarchical feature weighting based on information content and stability coefficients, revealing that temporal patterns and service transitions contain higher discriminative power than traditional demographic identifiers. We demonstrate that effective matching requires balancing feature stability against information content while maintaining computational efficiency. The framework provides a foundation for implementing reliable patient matching systems, though further validation across diverse healthcare environments is needed.

Download


Paper Citation


in Harvard Style

Teze V., Nazaruka E. and Bliznuks D. (2025). Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV. In Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE; ISBN 978-989-758-742-9, SciTePress, pages 280-291. DOI: 10.5220/0013475400003928


in Bibtex Style

@conference{enase25,
author={Vitalijs Teze and Erika Nazaruka and Dmirtijs Bliznuks},
title={Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV},
booktitle={Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE},
year={2025},
pages={280-291},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0013475400003928},
isbn={978-989-758-742-9},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Conference on Evaluation of Novel Approaches to Software Engineering - Volume 1: ENASE
TI - Information-Theoretic Patient Record Matching in Medical Databases: A Discriminative Power and Feature Analysis Using MIMIC-IV
SN - 978-989-758-742-9
AU - Teze V.
AU - Nazaruka E.
AU - Bliznuks D.
PY - 2025
SP - 280
EP - 291
DO - 10.5220/0013475400003928
PB - SciTePress