Facial Image Generation by Generative Adversarial Networks using

Weighted Conditions

Hiroki Adachi, Hiroshi Fukui, Takayoshi Yamashita and Hironobu Fujiyoshi

Chubu University, Kasugai, Aichi, Japan

Keywords:

Conditional Generative Adversarial Networks, CelebA Dataset.

Abstract:

CGANs are generative models that depend on Deep Learning and can generate images that meet given con-

ditions. However, if a network has a deep architecture, conditions do not provide enough information, so

unnatural images are generated. In this paper, we propose a facial image generation method by introducing

weighted conditions to CGANs. Weighted condition vectors are input in each layer of a generator, and then

a discriminator is extend to multi-tasks so as to recognize input conditions. This approach can step-by-step

reﬂect conditions inputted to the generator at every layer, fulﬁll the input conditions, and generate high quality

images. We demonstrate the effectiveness of our method in both subjective and objective evaluation experi-

ments.

1 INTRODUCTION

Generative Adversarial Networks (GANs) (Goodfel-

low et al., 2014) have received great research inte-

rests recently because this method is able to gene-

rate images or sentences using random noise vec-

tors. Therefore, many methods have been propo-

sed on the basis of learning techniques of GANs.

Conditional GANs (CGANs) (Mirza and Osindero,

2014)(Reed et al., 2016) can generate images that ful-

ﬁl certain conditions by inputting class labels, text,

and so on as conditions. GANs and CGANs are gene-

rally constructed by Multi Layer Perceptrons (MLPs),

which causes these have various problems such as un-

stable training, making it difﬁcult to generate high-

quality images. High-quality images can be generated

by Deep Convolutional GANs (DCGANs) (Radford

et al., 2016) and by replacing fully connected lay-

ers of CGANs with convolution layers (Conditional

DCGANs) (Gauthier, 2014). Speciﬁcally, DCGANs

are able to make training more stable by adding vari-

ous training techniques. Recently proposed methods

include unsupervised learning that can generate ima-

ges like CGANs as an auxiliary task (Chen et al.,

2016), and a method to improve the quality of gene-

rated images (Augustus et al., 2017). Moreover, the

latest state-of-the-art method, Progressive Growing

GANs (PGGANs) (Karras et al., 2018), can generate

high quality, natural-looking images by using a hier-

archical training process.

However, CGANs and Conditional DCGANs have a

problem in that inputted conditions vanish near the

output layer, so the generated images become unna-

tural when deep architecture networks such as PG-

GANs are used because conditions are inputted in

only the ﬁrst layer. Therefore, in this paper, we pro-

pose a facial image generation method by introducing

weighted conditions to CGANs. The proposed met-

hod generates images that stepwisely reﬂect conditi-

ons by inputting weighted conditions to a generator.

Additionally, to reﬂect conditions further after adver-

sarial learning of the generator, a discriminator ex-

pands multi-tasks so as to recognize conditions input

to the generator. Furthermore, we construct an enco-

der to extract the feature quantity of the input image

and propose a learning method that can reconstruct

images by inputting the extracted feature quantity to

the generator of the proposed method.

2 RELATED WORKS

2.1 Generative Adversarial Networks

GANs (Goodfellow et al., 2014) are generative mo-

dels using Deep Learning that consist of two net-

works: a generator and a discriminator. The generator

generates an image that deceives the discriminator by

using noise vectors as input. The discriminator accu-

Adachi H., Fukui H., Yamashita T. and Fujiyoshi H.

Facial Image Generation by Generative Adversarial Networks using Weighted Conditions.

DOI: 10.5220/0007377601390145

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 14th International Conference on

Computer Vision Theory and Applications), pages 139-145

ISBN: 978-989-758-354-4

139

rately classiﬁes between inputted real images and ge-

nerated images. The objective function of GANs is

given as

min

max

V (D,C) = E

x∼P

data

[logD(x

x)] +

z∼P(z

[log(1 − D(

x))], (1)

where z

z ∈ R

100

is noise vectors sampled from distri-

bution p(z

z) such as N (0, I) or U[−1, 1],

x is ima-

ges generated by the generator, and x

x is real images.

By adversarial learning of the generator and discri-

minator, images not included in training samples can

be generated. Also, unlike a Variational Autoenco-

der (VAE) (Kingma and Welling, 2014), GANs are

able to generate images that are not blurry because

they do not calculate error in pixel units. Vanilla

GANs have difﬁculty generating speciﬁc images be-

cause only noise vectors are inputted. CGANs (Mirza

and Osindero, 2014) are able to generate images that

fulﬁll conditions by using conditions such as class la-

bels and text corresponding to images. The objective

function of CGANs is given as

min

max

V (D, C ) = E

x∼P

data

[logD(x

x|y

y)] +

z∼P(z

[log(1 − D(

x|y

y))], (2)

where

x is G(z

z|y

y) obtained by inputting the noise

vector z

z and conditions y

y to the generator. Vanilla

GANs or CGANs have difﬁculty generating clear

images because of the way the MLP is conﬁgured.

To overcome this problem, DCGANs (Radford et al.,

2016), Conditional DCGANs (Gauthier, 2014), and

PGGANs (Karras et al., 2018) in which Deep Convo-

lutional Neural Network (DCNN) (Yann et al., 1998)

that have a convolutional layer and Batch Normali-

zation (Sergey and Christian, 2015) are introduced

have been proposed and are able to generate high-

quality images. In particular, PGGANs can stably ge-

nerate high-resolution natural images by ﬁrst genera-

ting global information, gradually adding a convolu-

tional layer to the network, generating detailed infor-

mation, and imposing a penalty for errors called Was-

serstein GANs Gradient Penalty (WGANs-GP) (Gul-

rajani et al., 2017).

2.2 Reconstruction Input Images

Many methods have been proposed that reconstruct

input data with an encoder such as VAE. A Conditio-

nal Adversarial Autoencoder (CAAE) (Zhang et al.,

2017) extracts rich feature vectors after inputting

high-dimensional facial images to an encoder. Then,

by inputting the condition of age in addition to the

extracted feature vector to the generator, CAAE can

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻼㼕㼤㼑㼘㻌㼣㼕㼟㼑㻌㻺㼛㼞㼙

㻺㼛㼕㼟㼑㻌㼂㼑㼏㼠㼛㼞

㻠㽢㻠

㻤㽢㻤㻝㻢㽢㻝㻢㻢㻠㽢㻢㻠㻟㻞㽢㻟㻞

㻝㻞㻤㽢㻝㻞㻤

㻌㻯㼛㼚㼢㻌㻝㽢㻝

㻌㻯㼛㼚㼢㻌㻝㽢㻝㻌㻯㼛㼚㼢㻌㻝㽢㻝㻌㻯㼛㼚㼢㻌㻝㽢㻝

㻌㻯㼛㼚㼢㻌㻝㽢㻝

㻌㻯㼛㼚㼐㼕㼠㼕㼛㼚

㼟㼕㼓㼙㼛㼕㼐

Figure 1: Generator adopted in proposed method.

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻭㼢㼑㼞㼍㼓㼑㻌㻼㼛㼛㼘㼕㼚㼓

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻝㻞㻤㽢㻝㻞㻤

㻢㻠㽢㻢㻠

㻟㻞㽢㻟㻞

㻝㻢㽢㻝㻢

㻤㽢㻤

㻾㼑㻸㼁

㻲㻯

㻾㼑㼏㼛㼓㼚㼕㼠㼕㼛㼚㻌㻮㼞㼍㼚㼏㼔

㼛㼞

㻭㼐㼢㼑㼞㼟㼍㼞㼕㼍㼘㻌㻮㼞㼍㼚㼏㼔

㻲㻯

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻯㼛㼚㼢㼛㼘㼡㼠㼕㼛㼚

㻸㼑㼍㼗㼥㻌㻾㼑㻸㼁

㻹㼕㼚㼕㼎㼍㼠㼏㼔㻌㻿㼠㼐㼐㼑㼢

㻠㽢㻠

㻯㼛㼚㼐㼕㼠㼕㼛㼚

Figure 2: Discriminator adopted in proposed method.

change the input facial image to various ages. Bidi-

rectional GANs (BiGANs) (Donahue et al., 2017) use

a learning method that inputs not only generated ima-

ges and real images but also the noise vector inputted

to the generator in addition to features obtained from

the encoder. α-GANs (Rosca et al., 2017) are similar

to BiGANs but are separating networks that recognize

the noise vector and the output of the encoder. Also,

α-GANs add the L1 norm between the training data

and generated data as reconstruction loss. These met-

hods are able to generate clear images by adversarial

learning.

3 PROPOSED METHOD:

WEIGHTED CONDITIONS AND

MULTI-TASK LEARNING

In this paper, we propose a facial image generation

method that inputs weighted conditions to the genera-

tor and recognizes conditions in the discriminator. We

also propose a method that reconstructs inputted ima-

ges by using the encoder and the generator in the pro-

posed method. First, we describe the learning man-

ner of the generator in 3.1 and leaning manner of the

multi-task discriminator in 3.2. Then we present the

learning algorithm using the encoder and generator in

3.3.

VISAPP 14th International Conference on Computer Vision Theory and Applications - 14th International Conference on Computer Vision

Theory and Applications

140

3.1 Introduced Weight: Learning of

Generator

In previous CGANs, conditions vanish near the output

layer because the conditional vector y

y ∈ {0, 1} is only

in the input layer. Thus, the generator in the proposed

method inputs conditions to a hidden layer other than

the input layer in like a skip connection. This appro-

ach can certainly reﬂect conditions until the near the

output layer. In addition, previous facial image ge-

neration methods directly input the binary condition

vector to the generator. On the other hand, the pro-

posed method applies 1 × 1 convolution process and

sigmoid function to the condition vector y

y expressed

in binary and inputs its output to the generator. There-

fore, we represent a continuous value y

y from 0 to 1 as

a condition vector. Moreover, each condition can be

weighted because the ﬁlter size of the convolutional

process is 1 × 1. By weighting conditions, the propo-

sed method is able to stepwisely reﬂect conditions in

such a way as to whole the generator because the most

suitable conditions can be reﬂected at the time of ge-

neration in each layer. Furthermore, we use Pixelwise

Normalization instead of Batch Normalization. Pixel-

wise Normalization is a normalization method used in

PGGANs that is able to improve the quality of gene-

rated images. Pixelwise Normalization is represented

x,y

∑

N−1

j=0

x,y

)

+ ε

, (3)

where N is the number of feature maps, a

x,y

and b

x,y

the feature vector before and after and ε = 10

−8

. This

series of processes is indicated in Figure 1.

3.2 Multi-task Discriminator

The discriminator inputs real or generated images and

simultaneously considers inputted conditions to dis-

tinguish between the real images or generated ones

are inputted to the discriminator, which simultane-

ously considers inputted conditions to distinguish be-

tween the images. The discriminator in our proposed

method improves multi-tasks so as to recognize given

conditions when the generator generates images. Fi-

gure 2 shows a multi-task network. The adversarial

branch and recognition branch in Figure 2 represent

a previous task of GANs and condition recognition,

respectively. In CGANs and Conditional DCGANs,

conditions are also given to the discriminator, but in

the proposed method add the recognition branch. It is

able to be considered alternative input conditions by

minimizing the condition recognition error, which is

computed by using the conditions inputted to the ge-

nerator. Minibatch Stddev is the standard deviation

for Mini Batch calibration. This proposed method at

PGGANs is able to generate diverse images.

Condition recognition error is added to the objective

function of previous CGANs. Thereby, adversarial le-

arning of the generator reﬂects more conditions. The

objective function of our proposed method is indica-

ted as

min

max

V (D, G) = E

x∼P

data

[logD(x

x)] +

z∼P(z

[log(1 − D(

x)] ∧ min L, (4)

where L is condition recognition error. If a dataset

of real facial images is used, our proposed method

ﬁnds it difﬁcult or impossible to recognize the images

by using the softmax function and cross entropy error

because multiple facial attributes in this dataset are re-

presented in binary. When mean square error is used,

the recognition branch of the number of attributes to

be recognized is required and calculation cost is high.

Hence, we calculate error by sigmoid cross entropy

because we calculate recognition error of multiple fa-

cial attributes with a one the recognition branch.

3.3 Obtain Feature Vector: Encoder

and Fine-tunned Generator

Generative methods such as α-GANs and BiGANs

use adversarial learning and an encoder and generate

images without ﬁne-tunned the generator. Therefore,

generative methods frequently generate unclear ima-

ges in initial learning. In addition, previous techni-

ques are high cost because they require multiple net-

works to be updated. Thus, we propose a way of lear-

ning that uses an encoder and a ﬁne-tuned generator.

Clear facial images can be generated from initial le-

arning using the ﬁne-tuned generator, and our method

generates images that maintain the identity of input-

ted images by inputting features obtained from the en-

coder to generator. Algorithm 1 details the proposed

learning process, and Figure 3 is illustration of prior.

Both f

f and

f are features output from the Encoder,

but the former is real images, and latter is generated

images. Moreover, all L in Algorithm 1 are Mean

Squared Error, but these errors are different. L

real

is the error of real images and their reconstructions,

noise

is error of the noise vector and embedded fea-

tures of image generated from the noise vector, and

f ake

is error of reconstructed images and image ge-

nerated from the noise vector. In our proposed lear-

ning algorithm ﬁxes parameters of the generator and

updates only the encoder.

Facial Image Generation by Generative Adversarial Networks using Weighted Conditions

141

(QFRGHU *HQHUDWRU



*HQHUDWRU

(QFRGHU



*HQHUDWRU

D

E

Figure 3: Training process using encoder and ﬁne-tuned ge-

nerator. (a) Reconstruction process using generated images

from noise vector. (b) Reconstruction process using real

images. Light gray and dark gray circles are feature vector

which embedded real images and fake ones from, respecti-

vely.

4 EXPERIMENT

We evaluate the quality of facial images generated in

the proposed method. Moreover, we evaluate the ef-

fectiveness of the multi-task Discriminator and weig-

hted condition Generator.

4.1 Experimental Details

In this experiment, facial images generated using

the conventional methods (Conditional DCGANs and

Conditional PGGANs) and DCGANs and PGGANs

using the proposed method (Weighted Condition

DCGANs and Weighted Condition PGGANs) are

compared. We use CelebA Dataset (Ziwei et al.,

2015) which contains at 200,000 facial images du-

ring training of every methods. For the condition, a

ﬁve-dimensional condition vector y

y is created using

ﬁve attributes (Male, Bangs, Eyeglasses, Goatee, and

Smiling) of 40 kinds of face attributes given to each

image of CelebA Dataset. Moreover, the noise vector

of 512-dimension sampling from a normal distribu-

tion is input to the generator. We compare the quality

of generated images in objective and subjective eva-

Algorithm 1: Training process using encoder and ﬁne-tuned

generator. m is batch size and λ = 0.1.

for Number of training iterations do

• Sampling minibatch of m noise data, training data,

and conditions z

∈ P(z

z), x

∈ P(x

x) and y

∈ P(y

y).

if Reconstruction of generated images from noise vec-

tor z

z then

noise

∑

i=1

z − E(

x|y

y)]

f ake

∑

i=1

G(z

z|y

y) − G(

f |y

else if Reconstruction using real images then

CH = {R, G, B}

real

∑

i∈CH

∑

j=0

x − G( f

f |y

y))

end if

L = exp(λ(L

noise

+ L

f ake

+ L

real

))

• Updating the encoder by using Adam optimizer.

end for

luations. In the objective evaluation, Inception Score

and Fr

echet Inception Distance (FID) are used. The

Inception Score is the average result of 10 evaluati-

ons. In the subjective evaluation, we use 150 images

every method and in 21 subjects evaluate generated

images in terms the quality and condition fulﬁlment.

We create the simple user interface for the subjective

evaluation.

4.2 Experimental Results

Figure 4 shows facial images generated by each met-

hod. In the visual evaluation, images generated by all

method are able to clearly show faces, and whether

the generated facial images reﬂected inputted condi-

tions is determined. Figure 4 (a) to (d) show all met-

hods were able to generate images of the same quality.

Comparing DCGANs in (a) and (c) and PGGANs in

(b) and (d), PGGANs generate facial images that look

more natural. Also, previous methods in (a) and (b)

set the gender to neutral when inputted attributes are

Eyeglasses+Smiling. Additionally, for Male+Goatee,

previous methods reﬂect smiling in a few images. By

contrast, our method in (c) and (d) is able to generate

images that fulﬁll indicated conditions.

VISAPP 14th International Conference on Computer Vision Theory and Applications - 14th International Conference on Computer Vision

Theory and Applications

142

㻹㼍㼘㼑

㻿㼙㼕㼘㼕㼚㼓

㻹㼍㼘㼑㻗㻳㼛㼍㼠㼑㼑

㻹㼍㼘㼑

㻿㼙㼕㼘㼕㼚㼓

㻹㼍㼘㼑㻗㻳㼛㼍㼠㼑㼑

㻿㼙㼕㼘㼕㼚㼓

㻹㼍㼘㼑

㻔㼍㻕㻌

㻱㼥㼑㼓㼘㼍㼟㼟㼑㼟㻗㻿㼙㼕㼘㼕㼚㼓

㻹㼍㼘㼑

㻿㼙㼕㼘㼕㼚㼓

㻹㼍㼘㼑㻗㻳㼛㼍㼠㼑㼑

㻱㼥㼑㼓㼘㼍㼟㼟㼑㼟㻗㻿㼙㼕㼘㼕㼚㼓

㻔㼏㻕㻌

㻔㼐㻕㻌

㻔㼎㻕

Figure 4: Facial images generated by different methods. Both (a) and (b) are generated by previous methods. (c) and (d) are

generated by our method. (a) and (c) show results for DCGANs. (b) and (d) show results for PGGANs.

Moreover, Figure 5 shows Weighted Condition

PGGANs can also generate natural high-quality fa-

cial images that fulﬁll condition and have higher re-

solution. Thus, Weighted Condition PGGANs can

generate high-resolution images with clear facial de-

tails. Figure 6 shows facial images reconstructed by

㻹㼍㼘㼑

㻹㼍㼘㼑㻌㻗㻌㻿㼙㼕㼘㼕㼚㼓

㻺㼛㻌㻮㼑㼍㼞㼐㻌㻗㻌㻿㼙㼕㼘㼕㼚㼓

㻮㼘㼛㼚㼐㻌㻴㼍㼕㼞㻌㻗㻌㻺㼛㻌㻮㼑㼍㼞㼐㻌㻗㻌㻿㼙㼕㼘㼕㼚㼓

Figure 5: Facial images [192×256 pixels] generated by

Weighted Condition PGGANs. Used facial conditions are

Male, Blond Hair, Eyeglasses, No Beard, and Smiling.

using the proposed algorithm. In this image genera-

tion experiment, we input facial attributes given real

images to the encoder and generator. Reconstructed

images cannot completely maintain real images iden-

tity but can maintain facial attributes, face direction,

and background color. Therefore, we are able to argue

that the generator of our proposed method can extract

global features of images inputted to the encoder.

The results of evaluating the generated image quan-

titatively are shown in Table 1. Note that generated

image are 128×128[pixels] in all methods. Weighted

,QSXWLPDJHV

5HFRQVWUXFWHGLPDJHV

Figure 6: Input images and images reconstructed by our

algorithm.

Condition DCGANs has a 0.03 lower Inception Score

and 21.3 lower FID than Conditional DCGANs. In

the subjective evaluation, Conditional DCGANs sco-

res higher than our Weighted Condition DCGANs.

Conditional PGGANs and our Weighted Condition

PGGANs have similar Inception Scores, but our met-

hod has higher FID and subjective evaluation score. It

is possible to conﬁrm Inception Score is close to real

images score when compare PGGANs and DCGANs.

Both Weighted Condition PGGANs and DCGANs

drastically reduce FID, especially PGGANs. There-

fore, we argue that Weighted Condition PGGANs can

improve the quality of generated images, but we think

that some conditions vanish because the network of

PGGANs is very deep. Thus, our proposed method

effectively generates facial images that fulﬁl conditi-

ons by using a deep architecture network.

4.3 Effective Multi-tasks and Weighted

Conditions

To evaluate the effectiveness of introducing condition

recognition to the discriminator and weighted condi-

tions to the generator, we built two networks and then

compared objective evaluation results of generated fa-

cial images. In the ﬁrst network, Recognition Branch

is removed from our discriminator, and in the second

network, convolutional layers of conditions are remo-

ved from our generator. Table 2 shows evaluation re-

Facial Image Generation by Generative Adversarial Networks using Weighted Conditions

143

Table 1: Evaluation results for various evaluation methods

Methods Inception Score ↑ FID ↓ Subjective Evaluation (21 people) ↑

Real Images 1.97 - -

Conditional DCGANs 1.70 402.4 53.1

Weighted Condition DCGANs (Proposed) 1.67 381.1 46.9

Conditional PGGANs 1.68 450.4 44.5

Weighted Condition PGGANs (Proposed) 1.73 387.6 55.5

Table 2: Comparison Inception Score and FID with and without conditions recognition and weighted conditions.

Recognized Conditions Inception Score ↑ FID ↓ Weighted Conditions Inception Score ↑ FID ↓

X 1.73 387.6 X 1.73 387.6

1.62 397.6 1.65 401.1

sults. According to results, Inception Score was about

0.1 lower and FID 10.0 higher when the discriminator

did not have condition recognition. Moreover, the In-

ception Score was about 0.1 lower and FID 13.5 hig-

her when the generator did not have weighted condi-

tions. Therefore, high quality and natural images can

be generated by introducing weighted conditions to

the generator and Recognition Branch to the discri-

minator.

4.4 Discussion

The reason the proposed method generated facial ima-

ges that fulﬁlled indicated conditions is assumed to

be that optimal facial attributes in each layer were re-

ﬂected by weighted conditions. Therefore, we visu-

alize the contribution of weighted conditions to the

generator. Contribution rates are calculated with a

weight ﬁlter in each convolutional layer. Contribution

rate C

is given as

∑

n=1

t,n

∑

m=1

m,n

, (5)

where N, M, W and t are the number of ﬁlters, num-

ber of attributes, weight ﬁlter, and a target attribute,

respectively. Figure 7 shows the contribution rates of

Male, Blond Hair, Eyeglasses, No Beard, and Smiling

to images generated at each resolution by Weighted

Condition PGGANs. Blond Hair + No Beard + Smi-

ling contribute more to middle images. Contribution

rates of Male and Blond Hair are highest in low re-

solution and then tend to decrease as resolution beco-

mes higher. The contribution rate of smiling increases

from the input layer to hidden layers and then decre-

ases toward the output layer. Furthermore, the contri-

bution rates of Eyeglasses and No Beard are highest

in high resolution images. Facial expressions are cle-

arly generated after the layer where the contribution

ratio of Smiling is the highest. Thus, in low resolu-

tion, global facial attributes have higher contribution

rates, and in high resolution, detailed facial attributes

㻹㼍㼘㼑

㻱㼥㼑㼓㼘㼍㼟㼟㼑㼟

㻺㼛㻌㻮㼑㼍㼞㼐

㻌㻮㼘㼛㼚㼐㻌㻴㼍㼕㼞

㻿㼙㼕㼘㼕㼚㼓

㻦㻌㻜

㻦㻌㻝

㻦㻌㻝㻌

㻦㻌㻝

㻦㻌㻜

㻠㽢㻟㻌㻌㻌㻌㻤㽢㻢㻌㻌㻌㻌

㻝㻢㽢㻝㻞㻌㻌㻌㻌㻟㻞㽢㻞㻠㻌㻌㻌㻌㻢㻠㽢㻠㻤㻌㻌㻌㻌㻝㻞㻤㽢㻥㻢㻌㻌㻌㻌㻞㻡㻢㽢㻝㻥㻞㻌㻌㻌㻌

㻮㼘㼛㼏㼗㻝

㻮㼘㼛㼏㼗㻞

㻮㼘㼛㼏㼗㻟㻮㼘㼛㼏㼗㻠

㻮㼘㼛㼏㼗㻡

㻮㼘㼛㼏㼗㻢

㻮㼘㼛㼏㼗㻣

㻯㼛㼚㼠㼞㼕㼎㼡㼠㼕㼛㼚㻌㻾㼍㼠㼑㻌㻔㻑㻕

㻹㼍㼘㼑

㻱㼥㼑㼓㼘㼍㼟㼟㼑㼟

㻺㼛㻌㻮㼑㼍㼞㼐

㻌㻮㼘㼛㼚㼐㻌㻴㼍㼕㼞

㻿㼙㼕㼘㼕㼚㼓

㻠㻡㻌㻌㻌㻌

㻠㻜㻌㻌㻌㻌

㻟㻡㻌㻌㻌㻌

㻟㻜㻌㻌㻌㻌

㻞㻡㻌㻌㻌㻌

㻞㻜㻌㻌㻌㻌

㻝㻡㻌㻌㻌㻌

㻝㻜㻌㻌㻌㻌

㻡㻌㻌㻌㻌

㻜㻌㻌㻌㻌

Figure 7: Contribution rate and generated image at each

depth.

have higher contribution rates, so our proposed met-

hod seems to be able to generate natural facial images

that fulﬁll conditions.

5 CONCLUSIONS AND FUTURE

WORKS

In this paper, we proposed a facial image genera-

tion method that introduces weighted conditions to

both DCGANs and PGGANs and a new image recon-

struction algorithm with an encoder. Condition can

be stepwisely reﬂected by inputting weighted condi-

tions. Moreover, conditions inputted to the generator

can be easily reﬂected by a multi-task discriminator.

The proposed method is able to generate facial ima-

ges that fulﬁl conditions in both DCGANs and PG-

GANs. Evaluation results showed our method is able

to drastically reduce the Fr

echet Inception Distance

(FID) score compared with previous methods. The

encoder using our algorithm can obtain effective fe-

atures in input image reconstruction. However, our

algorithm has difﬁculty completely reconstructing in-

put images. In future work, we will increase the reso-

VISAPP 14th International Conference on Computer Vision Theory and Applications - 14th International Conference on Computer Vision

Theory and Applications

144

lution of the generated images and attempt to stabilize

image generation.

REFERENCES

Augustus, O., Christopher, O., and Jonathon, S. (2017).

Conditional image synthesis with auxiliary classiﬁer

gans. In International Conference on Machine Lear-

ning.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever,

I., and Abbeel, P. (2016). Infogan: Interpretable re-

presentation learning by information maximizing ge-

nerative adversarial nets. In Neural Information Pro-

cessing Systems.

Donahue, J., Philipp, K., and Darrell, T. (2017). Adversa-

rial Feature Learning. In International Conference on

Learning Representation.

Gauthier, J. (2014). Conditional generative adversarial nets

for convolutional face generation. In Convolutional

Neural Networks for Visual Recognition.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,

Warde-Farley, D., Ozair, S., Courville, A., and Ben-

gio, Y. (2014). Generative adversarial nets. In Neural

Information Processing Systems Conference.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and

Courville, A. C. (2017). Improved training of wasser-

stein gans. In Neural Information Processing Systems.

Karras, T., Aila, T. A., Laine, S., and Lehtinen, J. (2018).

Progressive growing of gans for improved quality, sta-

bility, and variation. In International Conference on

Learning Representation.

Kingma, D. P. and Welling, M. (2014). Auto-encoding va-

riational bayes. In International Conference on Lear-

ning Representation.

Mirza, M. and Osindero, S. (2014). Conditional generative

adversarial nets. In arXiv.

Radford, A., Metz, L., and Chintala, S. (2016). Unsuper-

vised representation learning with deep convolutio-

nal generative adversarial networks. In International

Conference on Learning Representation.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B.,

and Lee, H. (2016). Generative adversarial text to

image synthesis. In International Conference on Ma-

chine Learning.

Rosca, M., Lakshminarayanan, B., Warde-Farley, D., and

Mohamed, S. (2017). Variational Approaches for

Auto-Encoding Generative Adversarial Networks. In

arXiv.

Sergey, I. and Christian, S. (2015). Batch normalization:

Accelerating deep network training by reducing in-

ternal covariate shift. In International Conference on

Machine Learning.

Yann, L., Leon, B., Yoshua, B., and Patrick, H. (1998).

Gradient-based learning applied to document recog-

nition. In Proceedings of the IEEE.

Zhang, Z., Song, Y., and Qi, H. (2017). Age progres-

sion/regression by conditional adversarial autoenco-

der. In International Conference on Conputer Vison

and Pattern Recognition.

Ziwei, L., Ping, L., Xiaogang, W., and Xiaoou, T. (2015).

Deep learning face attributes in the wild. In Internati-

onal Conference on Computer Vision.

Facial Image Generation by Generative Adversarial Networks using Weighted Conditions

145