Authors:
Syed Tazwar
1
;
Max Knobbout
2
;
Enrique Hortal Quesada
1
and
Mirela Popa
1
Affiliations:
1
Department of Advanced Computing Sciences, Faculty of Science and Engineering, Maastricht University, P.O. Box 616, 6200 MD, Maastricht, The Netherlands
;
2
Just Eat Takeaway.com, Amsterdam, The Netherlands
Keyword(s):
Generative AI, Variational Autoencoders, GANs, Tabular Data Representation.
Abstract:
Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-the-art models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperfor
ms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.
(More)