Table 1: Architecture of DiscoGAN and CycleGAN.
DiscoGAN CycleGAN
Net Layer C F S P BN A Net Layer C F S P BN A
G
c1 64 4 2 1 leaky
G
c1 32 7 1 3 ReLU
c2 128 4 2 1 leaky c2 64 4 2 1 ReLU
c3 256 4 2 1 leaky c3 128 4 2 1 ReLU
c4 512 4 2 1 leaky RN4 128 3 1 1 ReLU
dc5 256 4 2 1 ReLU RN5 128 3 1 1 ReLU
dc6 128 4 2 1 ReLU RN6 128 3 1 1 ReLU
dc7 64 4 2 1 ReLU RN7 128 3 1 1 ReLU
dc8 3 4 2 1 tanh RN8 128 3 1 1 ReLU
D
c1 512 4 2 1 leaky RN9 128 3 1 1 ReLU
c2 256 4 2 1 leaky RN10 128 3 1 1 ReLU
c3 128 4 2 1 leaky RN11 128 3 1 1 ReLU
c4 64 4 2 1 leaky RN12 128 3 1 1 ReLU
c5 1 4 1 0 identity unp13 64 2 2 0 identity
c13 64 3 1 1 ReLU
c convolution(down convolution) unp14 32 2 2 0 identity
dc deconvolution(up convolution) c14 32 3 1 1 ReLU
leaky leaky ReLU c15 3 7 1 3 tanh
identity identity mapping
D
c1 64 4 2 1 leaky
RN ResNet c2 128 4 2 1 leaky
unp unpooling c3 256 4 2 1 leaky
C...Channel, F...Filter Size, S...Stride, P...Padding,
BN... Batch Normalization, A...Activation Function
c4 512 4 2 1 leaky
c5 1 3 1 1 identity
Table 2: Shape evaluation result. For each list of digits, set
the method to be applied to “1” and those not apply “0”.
From the left “PatchGAN, LSGAN, L
CONST
A
, ResNet, Buf-
fer”. We will use this notation hereafter.
Failure Success
00000 00101 01100 10101 01010 10011 11110
00001 00110 01101 11000 01011 10110 11111
00010 00111 10000 11001 01110 10111
00011 01000 10001 11100 01111 11010
00100 01001 10100 11101 10010 11011
Table 1 shows the details of each CNN structure.
For PatchGAN and ResNet, use DiscoGAN network
when not used, and use CycleGAN network when
used.
5.2 Experimental Result
Several representative examples out of 32 experimen-
tal results are shown in Figure 7.
Models are compared with each other by checking
whether x
ABA
holds the form of the original input x
A
after application of G
AB
and G
BA
. We call the met-
hod “shape evaluation”. We have visually evaluated
whether the shape of the original input x
A
is maintai-
ned at x
ABA
from the Figure 7. It is assumed to be
“success” when the face outline, eyes, mouth were
reconstructed along the original input. On the other
hand, it is evaluated as “failure” when the mode col-
lapse occurs, when the shape of the face is malformed
or when a part of the part is missing. The results are
shown in Table 2.
Boldface items show failed patterns despite using
three or more learning techniques and successful pat-
terns despite using two or less learning techniques.
From these results, PatchGAN, LSGAN, ResNet can
be cited as methods having an important role in shape
reconstruction.
The condition for success can be summarized as
the following. In the case where PatchGAN is not
used, shape reconstruction will succeed if both LS-
GAN and ResNet are used. In the case where Pat-
chGAN is used, shape reconstruction is successful if
Table 3: Details of failure pattern.
Mode collapse No mode collapse
00000(0) 00110(1) 01101(1) 00011(1) 11100(2)
00001(0) 00111(1) 10000(1) 10001(1) 11101(2)
00010(1) 01000(1) 10100(1) 10101(1)
00100(0) 01001(1) 11000(2)
00101(0) 01100(1) 11001(2)
ResNet is used. All other cases failed. From these
results, it turns out that the most important learning
technique among the five techniques is ResNet.
From the above results, it seems that Buffer and
L
CONST
have no particular effect. As for L
CONST
, it
has no particular impact in this experiment, just as
Kim et al. stated that an arbitrary distance function
could be used. However, the effect of Buffer is unex-
plainable from the above result. We have found an
evidence that Buffer could contribute to avoid the
mode collapse, after classifying the failure patterns
based on the presence or absence of the mode col-
lapse. In this experiment, it is judged that mode col-
lapse occurs when an image similar to a particular
image is generated for two different input images.
The result is shown in Table 3.
In this table, the numbers in parentheses indicate
the number of techniques used in a model among
three techniques (PatchGAN, LSGAN, ResNet) that
are considered important. From the results, it can
be seen that mode collapse occurs if only one main
technique is used or not used at all. On the other
hand, when both PatchGAN and LSGAN are used,
mode collapse has not occurred. Buffer + ResNet
pattern and Buffer + PatchGAN pattern also did not
cause mode collapse, and the influence of Buffer on
major techniques was observed. However, as for Buf-
fer + ResNet, since the mode collapse pattern (00111)
also exists, it was found from this result that the ef-
fect of mode collapse prevention by combination of
PatchGAN and Buffer is significant.
6 DISCUSSION
Experimental results show that PatchGAN and LS-
GAN and ResNet are three important methods.
Among them, ResNet has an important role in both
shape reconstruction, and it is considered to be the
most important method among the five.
Although GAN originally uses random noise of
about 100 dimensions for input, this time we use ima-
ges for input. As seen in Figure 6, the input image is
compressed by the convolution layer (encoder), and
it is used as the input of the generator in the form of
a feature map. In this experiment, the training exam-
ples are 64 × 64 resolution images. When we use the
Factors Affecting Accuracy in Image Translation based on Generative Adversarial Network
451