
Table 8: Comparison of Models on Singling Out Risk, Linkability Risk, and Inference Risk with Respective Confidence
Intervals for the Insurance Dataset.
S-Out Link Inf
CopulaGAN 0.1249, CI=(0.0962, 0.1536) 0.0128, CI=(0.0, 0.073) 0.0414, CI=(0.0, 0.1957)
CTGAN 0.1090, CI=(0.0820, 0.1361) 0.0, CI=(0.0, 0.011) 0.02042, CI=(0.0, 0.2415)
GC 0.1566, CI=(0.1250, 0.1883) 0.0, CI=(0.0, 0.0080) 0.0588, CI=(0.0, 0.1252)
GMM 0.1011, CI=(0.0749, 0.1272) 0.01501, CI=(0.0, 0.1113) 0.1025, CI=(0.0069, 0.1980)
TVAE 0.1229, CI=(0.0944, 0.1514) 0.0045, CI=(0.0, 0.0784) 0.2732, CI=(0.0, 0.5786)
Random 0.9962, CI=(0.9924, 1.0) 0.9890, CI=(0.9780, 1.0) 0.9907, CI=(0.9813, 1.0)
Table 9: Comparison of Models Across Different Utility Metrics for the Insurance Dataset.
WS KS P&S Corr MI JS (Mean, Median, Var)
CopulaGAN 0.4444 0.9174 [0.9714, 0.9711] 0.9874 0.9126 (0.0449, 0.0338, 0.0181)
CTGAN 0.4141 0.9207 [0.9676, 0.969] 0.988 0.9236 (0.0415, 0.1931, 0.0136)
GaussianCopula 0.2939 0.9727 [0.9742, 0.9783] 0.9877 0.9666 (0.0158, 0.0047, 0.006)
GMM 0.2450 0.9682 [0.9906, 0.9888] 0.985 0.9353 (0.0104, 0.0035, 0.0034)
TVAE 0.4691 0.9340 [0.9554, 0.9575] 0.9831 0.9341 (0.0349, 0.0295, 0.0136)
Random 0.0000 1.0000 [1.0000;1.0000] 1.0000 1.0000 (0.0, 0.0, 0.0)
ACKNOWLEDGMENTS
This work was partially supported by (a) the Univer-
sity of Z
¨
urich UZH, Switzerland, and (b) the Horizon
Europe Framework Program’s project AISym4MED,
Grant Agreement No.101095387, funded by the
Swiss State Secretariat for Education, Research, and
Innovation SERI, under Contract No.22.00622.
REFERENCES
Diabetes Dataset.
Medical Cost Personal Datasets.
Risk Factors for Cardiovascular Heart Disease.
Bauer, A., Trapp, S., Stenger, M., Leppich, R., Kounev, S.,
Leznik, M., Chard, K., and Foster, I. (2024). Compre-
hensive Exploration of Synthetic Data Generation: A
Survey.
FEST (2024). Implementation of a synthetic tabular data
generation evaluation framework. https://github.com/
Karo2222/synprivutil. [Version 1.0].
Giomi, M., Boenisch, F., Wehmeyer, C., and Tasn
´
adi,
B. (2022). A unified framework for quantify-
ing privacy risk in synthetic data. arXiv preprint
arXiv:2211.10459.
Haque, S., Eberhart, Z., Bansal, A., and McMillan, C.
(2022). Semantic Similarity Metrics for Evaluat-
ing Source Code Summarization. IEEE Interna-
tional Conference on Program Comprehension, 2022-
March:36–47.
Hernandez, M., Epelde, G., Alberdi, A., Cilla, R., and
Rankin, D. Synthetic data generation for tabular
health records: A systematic review.
Kotelnikov, A., Baranchuk, D., Rubachev, I., and Babenko,
A. (2023). Tabddpm: Modelling tabular data with dif-
fusion models. In International Conference on Ma-
chine Learning, pages 17564–17579. PMLR.
Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C.,
Fu, T., and Wei, W. (2023). Machine Learning for
Synthetic Data Generation: A Review.
Motwani, R. and Xu, Y. (2007). Efficient Algorithms for
Masking and Finding Quasi-Identifiers.
Raab, G. M., Nowok, B., and Dibben, C. (2024). Practical
privacy metrics for synthetic data.
Sanchez-Serrano, P., Rios, R., and Agudo, I. Privacy-
preserving tabular data generation: Systematic liter-
ature review.
Stadler, T., Oprisanu, B., and Troncoso, C. Synthetic Data-
A Privacy Mirage.
Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veera-
machaneni, K. (2019). Modeling tabular data using
conditional gan. Advances in neural information pro-
cessing systems, 32.
Yale, A., Dash, S., Dutta, R., Guyon, I., Pavao, A., and
Bennett, K. P. (2020). Generation and evaluation of
privacy preserving synthetic health data. Neurocom-
puting, 416:244–255.
Zhao, Z., Kunar, A., Birke, R., and Chen, L. Y. (2021).
Ctab-gan: Effective table data synthesizing. In Asian
Conference on Machine Learning, pages 97–112.
PMLR.
Zhao, Z., Kunar, A., Birke, R., Van der Scheer, H., and
Chen, L. Y. (2024). Ctab-gan+: Enhancing tabular
data synthesis. Frontiers in big Data, 6:1296508.
ICISSP 2025 - 11th International Conference on Information Systems Security and Privacy
444