Authors:
Shonosuke Gonda
;
Fumihiko Sakaue
and
Jun Sato
Affiliation:
Nagoya Institute of Technology, Nagoya 466-8555, Japan
Keyword(s):
Image Synthesis, Image Distribution, GAN, Multi-Discriminator, Clip, Foundation Model, Multimodal.
Abstract:
In a Generative Adversarial Network (GAN), in which the generator and discriminator learn adversarially, the performance of the generator can be improved by improving the discriminator’s discriminatory ability. Thus, in this paper, we propose a method to improve the generator’s generative ability by adversarially training a single generator with multiple discriminators, each with different expertise. By each discriminator having different expertise, the overall discriminatory ability of the discriminator is improved, which improves the generator’s performance. However, it is not easy to give multiple discriminators independent expertise. To address this, we propose CLIP-MDGAN, which leverages CLIP, a large-scale learning model that has recently attracted a lot of attention, to classify a dataset into multiple classes with different visual features. Based on CLIP-based classification, each discriminator is assigned a specific subset of images to promote the development of independent
expertise. Furthermore, we introduce a method to gradually increase the number of discriminators in adversarial training to reduce instability in training multiple discriminators and reduce training costs.
(More)