Authors:
Sreekala Padinjarekkara
1
;
Jessica Alecci
2
and
Mirela Popa
1
Affiliations:
1
Maastricht University, Maastricht, 6229 EN, Netherlands
;
2
Irdeto B.V., Netherlands
Keyword(s):
Tabular Data Generation, XPCA Decomposition, ML Utility, Privacy Preservation.
Abstract:
The proposed method XPCA Gen, introduces a novel approach for synthetic tabular data generation by util-ising relevant patterns present in the data. This is performed using principle components obtained through XPCA (probabilistic interpretation of standard PCA) decomposition of original data. Since new data points are obtained by synthesizing the principle components, the generated data is an accurate and noise redundant representation of original data with a good diversity of data points. The experimental results obtained on benchmark datasets (e.g. CMC, PID) demonstrate performance in ML utility metrics (accuracy, precision, recall), showing its ability to capture inherent patterns in the dataset. Along with ML utility metrics, high Hausdorff distance indicates diversity in generated data without compromising statistical properties. Moreover, this is not a data hungry method like other complex neural networks. Overall, XPCA Gen emerges as a promising solution for data privacy pres
ervation and robust model training with diverse samples.
(More)