
erate a better Generator.
These results show that when using multiple dis-
criminators, gradually increasing their expertise can
result in better adversarial learning.
9 CONCLUSION
In this paper, we proposed a method to improve the
image generation ability of a generator by adversari-
ally training a generator using multiple discriminators
with different expertise. In particular, we proposed a
method to give multiple discriminators independent
expertise by dividing a dataset so that they have inde-
pendent image features and selecting images for each
discriminator by using CLIP. In addition, we showed
a method to gradually increase the number of discrim-
inators in order to eliminate the instability of training
that occurs when using multiple discriminators.
Experimental results showed that when dividing
the classes that each discriminator is responsible for,
more appropriate expertise is given to the discrimina-
tors when dividing based on the distribution of image
features obtained by CLIP rather than based on fea-
tures thought by humans. It was also revealed that
it is more efficient to give expertise while gradually
increasing the number of discriminators.
REFERENCES
Albuquerque, I., Monteiro, T., Doan, T., Considine, B.,
Falk, T., and Mitliagkas, I. (2019). Multi-objective
training of generative adversarial networks with multi-
ple discriminators. In Proc. International Conference
on Machine Learning.
Brock, A., Donahue, J., and Simonyan, K. (2019). Large
scale gan training for high fidelity natural image syn-
thesis. In International Conference on Learning Rep-
resentations (ICLR).
Choi, J. and Han, B. (2022). Mcl-gan:generative adversarial
networks with multiple specialized discriminators. In
Proc. Conference on Neural Information Processing
Systems (NeurIPS).
Dinh Nguyen, T., Le, T., Vu, H., and Phung, D. (2017).
Dual discriminator generative adversarial nets. In
Proc. Conference on Neural Information Processing
Systems (NIPS).
Durugkar, I., Gemp, I., and Mahadevan, S. (2017). Genera-
tive multi-adversarial networks. In Proc. International
Conference on Learning Representations.
Ghosh, A., Kulharia, V., Namboodiri, V., Torr, P., and Doka-
nia, P. (2018). Multi-agent diverse generative adver-
sarial networks. In Proc. IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. Advances
in neural information processing systems, 27:2672–
2680.
Guzman-Rivera, A., Batra, D., and Kohli, P. (2012). Mul-
tiple choice learning: Learning to produce multiple
structured outputs. In Proc. Conference on Neural In-
formation Processing Systems (NIPS).
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and
Hochreiter, S. (2017). Gans trained by a two time-
scale update rule converge to a local nash equilibrium.
Advances in Neural Information Processing Systems,
30.
Hoang, Q., Nguyen, T., Le, T., and Phung, D. (2018).
Mgan: Training generative adversarial nets with mul-
tiple generators. In Proc. International Conference on
Learning Representations.
Karras, T., Laine, S., and Aila, T. (2020). Analyzing and
improving the image quality of StyleGAN. In Proc.
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 8110–8119.
Kingma, D. P. and Welling, M. (2014). Auto-encoding vari-
ational bayes. arXiv preprint arXiv:1312.6114.
Ledig, C., Theis, L., Husz
´
ar, F., Caballero, J., Cunning-
ham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J.,
Wang, Z., and Shi, W. (2017). Photo-realistic single
image super-resolution using a generative adversarial
network. Proc. IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 4681–4690.
Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep learn-
ing face attributes in the wild. In Proc. International
Conference on Computer Vision (ICCV).
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh,
G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P.,
Clark, J., Krueger, G., and Sutskever, I. (2021). Learn-
ing transferable visual models from natural language
supervision. In International Conference on Machine
Learning (ICML). PMLR.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C.,
Radford, A., Chen, M., and Sutskever, I. (2021).
Zero-shot text-to-image generation. arXiv preprint
arXiv:2102.12092.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and
Ommer, B. (2022). High-resolution image synthesis
with latent diffusion models. In Proc. IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 10684–10695.
Ward, J. H. (1963). Hierarchical grouping to optimize an
objective function. Journal of the American Statistical
Association, 58(301):236–244.
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang,
X., and Metaxas, D. N. (2017). Stackgan: Text to
photo-realistic image synthesis with stacked genera-
tive adversarial networks. In Proc. IEEE International
Conference on Computer Vision (ICCV), pages 5907–
5915.
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
470