training techniques. Better control of
generation.StyleGAN3 allows fine control over
different attributes of the generated samples, such as
facial expressions, hairstyles, etc., providing more
personalization options. Super-resolution generation:
StyleGAN3 can generate high-resolution images,
including image generation for detail enhancement
and super-resolution reconstruction tasks.
styleGAN3's thought process is similar to that of
traditional GANs, but it introduces a number of
improvements and innovations in the model
architecture and training process. These
improvements include architectural optimization of
generators and discriminators, feature alignment
mechanisms, regularization methods, etc., aiming to
improve the quality and diversity of generated results.
It produces an automatically taught, unsupervised
separation of high-level features and random
variation in the resulting images and enables
straightforward, scale-specific synthesis
management. StyleGAN3 optimized from
Progressive GAN’s method, it leverages modern
GPU architecture and is implemented on modern
machine learning framework. Every StyleGAN
model provides a Generator, a Discriminator and a
Generator which focuses on exponential moving
average, denoted as Gema. So, these three parts in
pickle file is called G, D, and Gema. The point that
why DragGAN is implemented on StyleGAN3 is
because the third version of StyleGAN have obvious
improvements on video content generation, and they
share the both backend and training utilities. In origin
StyleGAN’s implementation, it use a style-based
generator to force the generated image stick to the
given sample. This is the reason why StyleGAN3 is
the state-of-the-art and is different from those
previous GAN. Traditionally, the generator receives
the latent code through an input layer. However,
StyleGAN includes it with the first constant image.
That is:
𝐴𝑑𝑎𝐼𝑁
𝑥
, 𝑦
= 𝑦
,
+ 𝑦
,
(1)
where each feature map 𝑥
is normalized separately,
and 𝑦 =
𝑦
, 𝑦
takes specialized 𝑤 and act as a style
to control adaptive instance normalization (AdaIN).
2.2.2 Latent Code
In the context of GANs, the Generator plays a crucial
role in producing the generated images, while the
Discriminator is responsible for training the
Generator by providing feedback and ensuring that
the generated results resemble the desired outcomes.
However, GANs have expanded beyond simple
image generation and can now be used to alter
existing images from a given source to a target image.
The connection between the Generator and
Discriminator is achieved through latent code, which
serves as an intermediary during the training process
and remains invisible until all updates have been
completed. As a result, the generated or altered
images are derived from the datasets used for training,
meaning that the outcomes are limited to what was
defined within those datasets. Consequently, if users
aim to alter an image that is not part of the training
dataset, the original DragGAN method is unable to
provide a viable solution. However, the Pivotal
Tuning for Latent-based editing (PIL) method offers
a solution to this limitation (Roich 2021). PIL enables
the alteration of a pretrained model without the need
for retraining. All that is required is the input image
that the user wishes to edit. Figure 1 showcases the
effectiveness of the Inverse Latent approach, which
allows for customizable image editing without the
need to retrain the model. This method seamlessly
integrates into the entire process, providing a user-
friendly and efficient solution for image
manipulation. By leveraging PIL, users now have the
ability to modify images that were not part of the
original training dataset, expanding the scope of
possible image alterations and offering greater
flexibility in creative expression. This advancement
introduces a significant shift in latent-based editing
techniques, granting users the power to customize and
transform images with ease and innovation.
How PIL affects pretrained models is in three
steps, inversion, tuning and regularization. It is used to
approach the origin StyleGAN model and adapt it to
the given image. Inversion thus serves to improve the
tuning phase's starting point. The most editable latent
space is StyleGAN's native latent space 𝑊. Consider
that the implementation should reconstruct 𝑠, which is
the input image, with optimizing the latent code 𝑊,
the pivot code 𝑝 and the noise vector 𝑣. The following
objective defines the optimization:
𝑊
, 𝑣 =argmin𝐿
𝑠, 𝐺
𝑤, 𝑣; 𝜃
+
𝜆
𝐿
(𝑣) (2)
in which 𝐺(𝑤, 𝑣; 𝜃) defines that generator 𝐺 and
weights 𝜃 create the generated image. Instead of
traditional StyleGAN-based methods using its
mapping network, The PIL uses three individual
networks. 𝐿
, the perceptual loss,
𝐿
, the noise
regularization term and 𝜆
the hyperparameter.