tion. In Proceedings of the IEEE International Con-
ference on Computer Vision, pages 1422–1430.
Durall, R., Pfreundt, F.-J., and Keuper, J. (2019). Stabi-
lizing gans with octave convolutions. arXiv preprint
arXiv:1905.12534.
Durall, R., Pfreundt, F.-J., and Keuper, J. (2020). Lo-
cal facial attribute transfer through inpainting. arXiv
preprint arXiv:2002.03040.
Gidaris, S., Singh, P., and Komodakis, N. (2018). Unsu-
pervised representation learning by predicting image
rotations. arXiv preprint arXiv:1803.07728.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative adversarial nets. In
Advances in neural information processing systems,
pages 2672–2680.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and
Courville, A. C. (2017). Improved training of wasser-
stein gans. In Advances in neural information pro-
cessing systems, pages 5767–5777.
Hinton, G. E. and Roweis, S. T. (2003). Stochastic neigh-
bor embedding. In Advances in neural information
processing systems, pages 857–864.
Hinz, T., Heinrich, S., and Wermter, S. (2019). Generating
multiple objects at spatially distinct locations. arXiv
preprint arXiv:1901.00686.
Ho, K., Keuper, J., and Keuper, M. (2020). Learn-
ing embeddings for image clustering: An empiri-
cal study of triplet loss approaches. arXiv preprint
arXiv:2007.03123.
Hong, S., Yang, D., Choi, J., and Lee, H. (2018). Inferring
semantic layout for hierarchical text-to-image synthe-
sis. In Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 7986–
7994.
Huang, X., Liu, M.-Y., Belongie, S., and Kautz, J. (2018).
Multimodal unsupervised image-to-image translation.
In Proceedings of the European Conference on Com-
puter Vision (ECCV), pages 172–189.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).
Image-to-image translation with conditional adversar-
ial networks. In Proceedings of the IEEE conference
on computer vision and pattern recognition, pages
1125–1134.
Karras, T., Laine, S., and Aila, T. (2019). A style-based
generator architecture for generative adversarial net-
works. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 4401–
4410.
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen,
J., and Aila, T. (2020). Analyzing and improving
the image quality of stylegan. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition, pages 8110–8119.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im-
agenet classification with deep convolutional neural
networks. In Advances in neural information process-
ing systems, pages 1097–1105.
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998).
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE, 86(11):2278–2324.
Li, P., Hastie, T. J., and Church, K. W. (2006). Very sparse
random projections. In Proceedings of the 12th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, pages 287–296.
Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). Deep learn-
ing face attributes in the wild. In Proceedings of In-
ternational Conference on Computer Vision (ICCV).
Mao, Q., Lee, H.-Y., Tseng, H.-Y., Ma, S., and Yang, M.-
H. (2019). Mode seeking generative adversarial net-
works for diverse image synthesis. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1429–1437.
Milbich, T., Ghori, O., Diego, F., and Ommer, B. (2020).
Unsupervised representation learning by discover-
ing reliable image relations. Pattern Recognition,
102:107107.
Mirza, M. and Osindero, S. (2014). Conditional generative
adversarial nets. arXiv preprint arXiv:1411.1784.
Misra, I., Zitnick, C. L., and Hebert, M. (2016). Shuffle
and learn: unsupervised learning using temporal order
verification. In European Conference on Computer
Vision, pages 527–544. Springer.
Miyato, T., Kataoka, T., Koyama, M., and Yoshida, Y.
(2018). Spectral normalization for generative adver-
sarial networks. arXiv preprint arXiv:1802.05957.
Odena, A., Olah, C., and Shlens, J. (2017). Conditional
image synthesis with auxiliary classifier gans. In Pro-
ceedings of the 34th International Conference on Ma-
chine Learning-Volume 70, pages 2642–2651. JMLR.
org.
Oord, A. v. d., Li, Y., and Vinyals, O. (2018). Representa-
tion learning with contrastive predictive coding. arXiv
preprint arXiv:1807.03748.
Park, T., Liu, M.-Y., Wang, T.-C., and Zhu, J.-Y. (2019).
Semantic image synthesis with spatially-adaptive nor-
malization. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages
2337–2346.
Rao, D., Visin, F., Rusu, A., Pascanu, R., Teh, Y. W., and
Hadsell, R. (2019). Continual unsupervised represen-
tation learning. In Advances in Neural Information
Processing Systems, pages 7647–7657.
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B.,
and Lee, H. (2016). Generative adversarial text to im-
age synthesis. arXiv preprint arXiv:1605.05396.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,
Radford, A., and Chen, X. (2016). Improved tech-
niques for training gans. In Advances in neural infor-
mation processing systems, pages 2234–2242.
Sanchez, E. and Valstar, M. (2018). Triple consistency loss
for pairing distributions in gan-based face synthesis.
arXiv preprint arXiv:1811.03492.
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and
Manzagol, P.-A. (2010). Stacked denoising autoen-
coders: Learning useful representations in a deep net-
work with a local denoising criterion. Journal of ma-
chine learning research, 11(Dec):3371–3408.
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang,
X., and He, X. (2018). Attngan: Fine-grained text
to image generation with attentional generative ad-
versarial networks. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition,
pages 1316–1324.
VISAPP 2021 - 16th International Conference on Computer Vision Theory and Applications
34