ACKNOWLEDGEMENTS
This work was supported by JSPS KAKENHI Grant
Numbers JP21H03496, JP22K12157.
REFERENCES
Alaluf, Y., Tov, O., Mokady, R., Gal, R., and Bermano,
A. H. (2021). Hyperstyle: Stylegan inversion with
hypernetworks for real image editing.
Bikowski, M., Sutherland, D. J., Arbel, M., and Gretton, A.
(2018). Demystifying mmd gans.
Dinh, T. M., Tran, A. T., Nguyen, R., and Hua, B.-S. (2022).
Hyperinverter: Improving stylegan inversion via hy-
pernetwork. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Ha, D., Dai, A. M., and Le, Q. V. (2017). Hypernetworks.
In International Conference on Learning Representa-
tions.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and
Hochreiter, S. (2017). Gans trained by a two time-
scale update rule converge to a local nash equilibrium.
In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H.,
Fergus, R., Vishwanathan, S., and Garnett, R., editors,
Advances in Neural Information Processing Systems,
volume 30. Curran Associates, Inc.
Hu, X., Huang, Q., Shi, Z., Li, S., Gao, C., Sun, L., and Li,
Q. (2022). Style transformer for image inversion and
editing. arXiv preprint arXiv:2203.07932.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).
Image-to-image translation with conditional adversar-
ial networks. CVPR.
Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2018). Pro-
gressive growing of GANs for improved quality, sta-
bility, and variation. In International Conference on
Learning Representations.
Karras, T., Laine, S., and Aila, T. (2019). A style-based
generator architecture for generative adversarial net-
works. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR).
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J.,
and Aila, T. (2020). Analyzing and improving the im-
age quality of StyleGAN. In Proc. CVPR.
Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. (2019).
Semantic image synthesis with spatially-adaptive nor-
malization. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition.
Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D., and
Lischinski, D. (2021). Styleclip: Text-driven manip-
ulation of stylegan imagery. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 2085–2094.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G.,
Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark,
J., Krueger, G., and Sutskever, I. (2021). Learning
transferable visual models from natural language su-
pervision. arXiv preprint arXiv:2103.00020.
Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar,
Y., Shapiro, S., and Cohen-Or, D. (2021). Encoding
in style: a stylegan encoder for image-to-image trans-
lation. In IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR).
Roich, D., Mokady, R., Bermano, A. H., and Cohen-Or, D.
(2021). Pivotal tuning for latent-based editing of real
images. ACM Trans. Graph.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-net:
Convolutional networks for biomedical image seg-
mentation. CoRR, abs/1505.04597.
Tov, O., Alaluf, Y., Nitzan, Y., Patashnik, O., and Cohen-Or,
D. (2021). Designing an encoder for stylegan image
manipulation. arXiv preprint arXiv:2102.02766.
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., and
Catanzaro, B. (2018). High-resolution image synthe-
sis and semantic manipulation with conditional gans.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition.
Wang, Z., Simoncelli, E. P., and Bovik, A. C. (2003). Mul-
tiscale structural similarity for image quality assess-
ment. In In The Thrity-Seventh Asilomar Conference
on Signals, Systems & Computers, pages 1398–1402.
Wright, L. (2019). Ranger - a synergistic opti-
mizer. https://github.com/lessw2020/Ranger-Deep-
Learning-Optimizer.
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang,
X., and He, X. (2018). Attngan: Fine-grained text to
image generation with attentional generative adversar-
ial networks.
Zhang, R., Isola, P., Efros, A. A., Shechtman, E., and Wang,
O. (2018). The unreasonable effectiveness of deep
features as a perceptual metric. In CVPR.
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., and
Torralba, A. (2017). Scene parsing through ade20k
dataset. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition.
Zhu, P., Abdal, R., Qin, Y., and Wonka, P. (2020). Sean:
Image synthesis with semantic region-adaptive nor-
malization. In IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR).
ICAART 2023 - 15th International Conference on Agents and Artificial Intelligence
300