Authors:
Peter Schneider-Kamp
;
Anton Lautrup
and
Tobias Hyrup
Affiliation:
Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, Odense, Denmark
Keyword(s):
Synthetic Data, Generative AI, Evaluation Metrics, Privacy, Utility, Pipelines, Method Chaining.
Abstract:
Synthetic data is by many expected to have a significant impact on data science by enhancing data privacy, reducing biases in datasets, and enabling the scaling of datasets beyond their original size. However, the current landscape of tabular synthetic data generation is fragmented, with numerous frameworks available, only some of which have integrated evaluation modules. synthesizers is a meta-framework that simplifies the process of generating and evaluating tabular synthetic data. It provides a unified platform that allows users to select generative models and evaluation tools from open-source implementations in the research field and apply them to datasets of any format. The aim of synthesizers is to consolidate the diverse efforts in tabular synthetic data research, making it more accessible to researchers from different sub-domains, including those with less technical expertise such as health researchers. This could foster collaboration and increase the use of synthetic data to
ols, ultimately leading to more effective research outcomes.
(More)