Evaluating Synthetic Data Generation Techniques for Medical Dataset
Takayuki Miura, Eizen Kimura, Atsunori Ichikawa, Masanobu Kii, Juko Yamamoto
2024
Abstract
Anticipation surrounds the use of real-world data for data analysis in medicine and healthcare, yet handling sensitive data demands ethical review and safety management, presenting bottlenecks in the swift progression of research. Consequently, numerous techniques have emerged for generating synthetic data, which preserves the features of the original data. Nonetheless, the quality of such synthetic data, particularly in the context of real-world data, has yet to be sufficiently examined. In this paper, we conduct experiments with a Diagonosis Procedure Combination (DPC) dataset to evaluate the quality of synthetic data generated by statistics-based, graphical model-based, and deep neural network-based methods. Further, we implement differential privacy for theoretical privacy protection and assess the resultant degradation of data quality. The findings indicate that a statistics-based method called Gaussian Copula and a graphical-model-based method called AIM yield high-quality synthetic data regarding statistical similarity and machine learning model performance. The paper also summarizes issues pertinent to the practical application of synthetic data derived from the experimental results.
DownloadPaper Citation
in Harvard Style
Miura T., Kimura E., Ichikawa A., Kii M. and Yamamoto J. (2024). Evaluating Synthetic Data Generation Techniques for Medical Dataset. In Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF; ISBN 978-989-758-688-0, SciTePress, pages 315-322. DOI: 10.5220/0012314500003657
in Bibtex Style
@conference{healthinf24,
author={Takayuki Miura and Eizen Kimura and Atsunori Ichikawa and Masanobu Kii and Juko Yamamoto},
title={Evaluating Synthetic Data Generation Techniques for Medical Dataset},
booktitle={Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF},
year={2024},
pages={315-322},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012314500003657},
isbn={978-989-758-688-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 2: HEALTHINF
TI - Evaluating Synthetic Data Generation Techniques for Medical Dataset
SN - 978-989-758-688-0
AU - Miura T.
AU - Kimura E.
AU - Ichikawa A.
AU - Kii M.
AU - Yamamoto J.
PY - 2024
SP - 315
EP - 322
DO - 10.5220/0012314500003657
PB - SciTePress