training. We have also designed a generator network
that is suitable for the proposed training method. To
validate our proposed model, we have conducted ex-
tensive evaluations on the LookBook dataset. Compa-
red to other conditional GAN models, our model can
generate visually pleasing 256 × 256 clothing images
while keeping global structures and containing details
of target images.
ACKNOWLEDGEMENTS
We are thankful to the anonymous reviewers for
their comments. This work is supported by SW
StarLab program (IITP-2015-0-00199), and NRF
(NRF2017M3C4A7066317). Prof. Yoon is a corre-
sponding author of the paper.
REFERENCES
Arjovsky, M., Chintala, S., and Bottou, L. (2017). Was-
serstein gan. In International Conference on Machine
Learning (ICML).
Brock, A., Lim, T., Ritchie, J. M., and Weston, N. (2016).
Neural photo editing with introspective adversarial
networks. arXiv preprint arXiv:1609.07093.
Denton, E. L., Chintala, S., Fergus, R., et al. (2015). Deep
generative image models using a laplacian pyramid of
adversarial networks. In Advances in neural informa-
tion processing systems, pages 1486–1494.
Goodfellow, I. (2016). Nips 2016 tutorial: Generative ad-
versarial networks. arXiv preprint arXiv:1701.00160.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and
Courville, A. C. (2017). Improved training of wasser-
stein gans. In Advances in Neural Information Pro-
cessing Systems, pages 5767–5777.
He, K., Zhang, X., Ren, S., and Sun, J. (2015). Delving
deep into rectifiers: Surpassing human-level perfor-
mance on imagenet classification. In Proceedings of
the IEEE international conference on computer vi-
sion, pages 1026–1034.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resi-
dual learning for image recognition. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 770–778.
Huang, G., Liu, Z., van der Maaten, L., and Weinber-
ger, K. Q. (2017). Densely connected convolutional
networks. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
4700–4708.
Iizuka, S., Simo-Serra, E., and Ishikawa, H. (2017). Glo-
bally and locally consistent image completion. ACM
Transactions on Graphics (TOG), 36(4):107.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).
Image-to-image translation with conditional adversa-
rial networks. In The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR).
Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2017). Pro-
gressive growing of gans for improved quality, stabi-
lity, and variation. In International Conference on Le-
arning Representations (ICLR).
Kim, S., Kim, T., Kim, M. H., and Yoon, S.-E. (2017a).
Image completion with intrinsic reflectance guidance.
In Proc. British Machine Vision Conference (BMVC
2017).
Kim, T., Cha, M., Kim, H., Lee, J. K., and Kim, J. (2017b).
Learning to discover cross-domain relations with ge-
nerative adversarial networks. In ICML.
Kingma, D. and Adam, J. B. (2015). Adam: A method for
stochastic optimization. In International Conference
on Learning Representations (ICLR).
Lassner, C., Pons-Moll, G., and Gehler, P. V. (2017). A
generative model of people in clothing. In The IEEE
International Conference on Computer Vision (ICCV).
Ledig, C., Theis, L., Husz
´
ar, F., Caballero, J., Cunning-
ham, A., Acosta, A., Aitken, A. P., Tejani, A., Totz, J.,
Wang, Z., et al. (2017). Photo-realistic single image
super-resolution using a generative adversarial net-
work. In The IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), pages 4681–4690.
Marr, D. and Hildreth, E. (1980). Theory of edge detection.
Proc. R. Soc. Lond. B, 207(1167):187–217.
Mathieu, M., Couprie, C., and LeCun, Y. (2015). Deep
multi-scale video prediction beyond mean square er-
ror. arXiv preprint arXiv:1511.05440.
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., and
Efros, A. A. (2016). Context encoders: Feature lear-
ning by inpainting. In The IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR).
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In International Conference on Medi-
cal Image Computing and Computer-Assisted Inter-
vention, pages 234–241. Springer.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,
Radford, A., and Chen, X. (2016). Improved techni-
ques for training gans. In Advances in Neural Infor-
mation Processing Systems, pages 2234–2242.
Szeliski, R. (2010). Computer vision: algorithms and ap-
plications. Springer Science & Business Media.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.
(2004). Image quality assessment: from error visi-
bility to structural similarity. IEEE transactions on
image processing, 13(4):600–612.
Yoo, D., Kim, N., Park, S., Paek, A. S., and Kweon, I. S.
(2016). Pixel-level domain transfer. In European Con-
ference on Computer Vision, pages 517–532. Springer.
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang,
X., and Metaxas, D. (2016). Stackgan: Text to photo-
realistic image synthesis with stacked generative ad-
versarial networks. arXiv preprint arXiv:1612.03242.
Zhao, B., Wu, X., Cheng, Z.-Q., Liu, H., Jie, Z., and Feng,
J. (2017). Multi-view image generation from a single-
view. arXiv preprint arXiv:1704.04886.
Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. (2017).
Unpaired image-to-image translation using cycle-
consistent adversarial networks. In The IEEE Inter-
national Conference on Computer Vision (ICCV).
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
90