Authors:
Sheela Raju Kurupathi
1
;
Pramod Murthy
2
and
Didier Stricker
1
Affiliations:
1
Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany, Augmented Vision, German Research Center for Artificial Intelligence, Kaiserslautern, Germany
;
2
Augmented Vision, German Research Center for Artificial Intelligence, Kaiserslautern, Germany
Keyword(s):
Conditional GANs, Human Pose, Market-1501, DeepFashion.
Abstract:
One of the main challenges of human-image generation is generating a person along with pose and clothing details. However, it is still a difficult task due to challenging backgrounds and appearance variance. Recently, various deep learning models like Stacked Hourglass networks, Variational Auto Encoders (VAE), and Generative Adversarial Networks (GANs) have been used to solve this problem. However, still, they do not generalize well to the real-world human-image generation task qualitatively. The main goal is to use the Spectral Normalization (SN) technique for training GAN to synthesize the human-image along with the perfect pose and appearance details of the person. In this paper, we have investigated how Conditional GANs, along with Spectral Normalization (SN), could synthesize the new image of the target person given the image of the person and the target (novel) pose desired. The model uses 2D keypoints to represent human poses. We also use adversarial hinge loss and present an
ablation study. The proposed model variants have generated promising results on both the Market-1501 and DeepFashion Datasets. We supported our claims by benchmarking the proposed model with recent state-of-the-art models. Finally, we show how the Spectral Normalization (SN) technique influences the process of human-image synthesis.
(More)