The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record
Scott McLachlan, Kudakwashe Dube, Thomas Gallagher, Bridget Daley, Jason Walonoski
2018
Abstract
Realistic synthetic data are increasingly being recognized as solutions to lack of data or privacy concerns in healthcare and other domains, yet little effort has been expended in establishing a generic framework for characterizing, achieving and validating realism in Synthetic Data Generation (SDG). The objectives of this paper are to: (1) present a characterization of the concept of realism as it applies to synthetic data; and (2) present and demonstrate application of the generic ATEN Framework for achieving and validating realism for SDG. The characterization of realism is developed through insights obtained from analysis of the literature on SDG. The development of the generic methods for achieving and validating realism for synthetic data was achieved by using knowledge discovery in databases (KDD), data mining enhanced with concept analysis and identification of characteristic, and classification rules. Application of this framework is demonstrated by using the synthetic Electronic Healthcare Record (EHR) for the domain of midwifery. The knowledge discovery process improves and expedites the generation process; having a more complex and complete understanding of the knowledge required to create the synthetic data significantly reduce the number of generation iterations. The validation process shows similar efficiencies through using the knowledge discovered as the elements for assessing the generated synthetic data. Successful validation supports claims of success and resolves whether the synthetic data is a sufficient replacement for real data. The ATEN Framework supports the researcher in identifying the knowledge elements that need to be synthesized, as well as supporting claims of sufficient realism through the use of that knowledge in a structured approach to validation. When used for SDG, the ATEN Framework enables a complete analysis of source data for knowledge necessary for correct generation. The ATEN Framework ensures the researcher that the synthetic data being created is realistic enough for the replacement of real data for a given use-case.
DownloadPaper Citation
in Harvard Style
McLachlan S., Dube K., Gallagher T., Daley B. and Walonoski J. (2018). The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF; ISBN 978-989-758-281-3, SciTePress, pages 220-230. DOI: 10.5220/0006677602200230
in Bibtex Style
@conference{healthinf18,
author={Scott McLachlan and Kudakwashe Dube and Thomas Gallagher and Bridget Daley and Jason Walonoski},
title={The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record},
booktitle={Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF},
year={2018},
pages={220-230},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006677602200230},
isbn={978-989-758-281-3},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018) - Volume 5: HEALTHINF
TI - The ATEN Framework for Creating the Realistic Synthetic Electronic Health Record
SN - 978-989-758-281-3
AU - McLachlan S.
AU - Dube K.
AU - Gallagher T.
AU - Daley B.
AU - Walonoski J.
PY - 2018
SP - 220
EP - 230
DO - 10.5220/0006677602200230
PB - SciTePress