the COCO dataset and by about 0.40 for the Visual
Genome dataset. In addition, we generated more nat-
ural images by using the layout-to-image model Lay-
out2img supplementally. Although we added only
“between” relation as a trinomial hyperedge, adding
other kinds of hyperedges would lead to generating
more consistent images with the input. Thus it is an
interesting direction for our future work.
REFERENCES
Agnese, J., Herrera, J., Tao, H., and Zhu, X. (2020). A Sur-
vey and Taxonomy of Adversarial Neural Networks
for Text-to-Image Synthesis. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery,
10(4):e1345.
Caesar, H., Uijlings, J., and Ferrari, V. (2018). Coco-Stuff:
Thing and Stuff Classes in Context. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1209–1218.
Chen, Q. and Koltun, V. (2017). Photographic Image Syn-
thesis with Cascaded Refinement Networks. In Pro-
ceedings of the IEEE International Conference on
Computer Vision, pages 1511–1520.
Elgammal, A., Liu, B., Elhoseiny, M., and Mazzone, M.
(2017). CAN: Creative Adversarial Networks Gener-
ating “Art” by Learning About Styles and Deviating
from Style Norms. arXiv preprint arXiv:1706.07068.
Frolov, S., Hinz, T., Raue, F., Hees, J., and Dengel, A.
(2021). Adversarial Text-to-Image Synthesis: A re-
view. arXiv preprint arXiv:2101.09983.
Ghorbani, A., Natarajan, V., Coz, D., and Liu, Y. (2020).
DermGAN: Synthetic Generation of Clinical Skin Im-
ages with Pathology. In Machine Learning for Health
Workshop, pages 155–170. PMLR.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B.,
Warde-Farley, D., Ozair, S., Courville, A., and Ben-
gio, Y. (2014). Generative Adversarial Nets. Advances
in Neural Information Processing Systems, 27.
Gui, J., Sun, Z., Wen, Y., Tao, D., and Ye, J. (2021). A
Review on Generative Adversarial Networks: Algo-
rithms, Theory, and Applications. IEEE Transactions
on Knowledge and Data Engineering.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Resid-
ual Learning for Image Recognition. In Proceedings
of the IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 770–778.
He, S., Liao, W., Yang, M. Y., Yang, Y., Song, Y.-Z., Rosen-
hahn, B., and Xiang, T. (2021). Context-Aware Layout
to Image Generation with Enhanced Object Appear-
ance. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
15049–15058.
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and
Hochreiter, S. (2017). GANs Trained by a Two Time-
Scale Update Rule Converge to a Local Nash Equilib-
rium. In NuerIPS.
Hinz, T., Heinrich, S., and Wermter, S. (2019). Generat-
ing Multiple Objects at Spatially Distinct Locations.
ArXiv, abs/1901.00686.
Hinz, T., Heinrich, S., and Wermter, S. (2022). Semantic
Object Accuracy for Generative Text-to-Image Syn-
thesis. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 44:1552–1565.
Johnson, J., Gupta, A., and Fei-Fei, L. (2018). Image
Generation from Scene Graphs. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition, pages 1219–1228.
Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D.,
Bernstein, M., and Fei-Fei, L. (2015). Image Retrieval
Using Scene Graphs. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition,
pages 3668–3678.
Kingma, D. P. and Ba, J. (2015). Adam: A Method for
Stochastic Optimization. CoRR, abs/1412.6980.
Kingma, D. P. and Welling, M. (2013). Auto-Encoding
Variational Bayes. arXiv preprint arXiv:1312.6114.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata,
K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J.,
Shamma, D. A., et al. (2017). Visual Genome: Con-
necting Language and Vision Using Crowdsourced
Dense Image Annotations. International Journal of
Computer Vision, 123(1):32–73.
Li, Y., Ma, T., Bai, Y., Duan, N., Wei, S., and Wang,
X. (2019). PasteGAN: A Semi-Parametric Method
to Generate Image from Scene Graph. ArXiv,
abs/1905.01608.
Mirza, M. and Osindero, S. (2014). Conditional Generative
Adversarial Nets. arXiv preprint arXiv:1411.1784.
Mittal, G., Agrawal, S., Agarwal, A., Mehta, S., and Mar-
wah, T. (2019). Interactive Image Generation Using
Scene Graphs. ArXiv, abs/1905.03743.
Nie, D., Trullo, R., Lian, J., Petitjean, C., Ruan, S., Wang,
Q., and Shen, D. (2017). Medical Image Synthe-
sis with Context-Aware Generative Adversarial Net-
works. In International Conference on Medical Im-
age Computing and Computer-Assisted Intervention,
pages 417–425. Springer.
Odena, A., Olah, C., and Shlens, J. (2017). Conditional Im-
age Synthesis with Auxiliary Classifier GANs. In In-
ternational Conference on Machine Learning, pages
2642–2651. PMLR.
Radford, A., Metz, L., and Chintala, S. (2015). Un-
supervised Representation Learning with Deep Con-
volutional Generative Adversarial Networks. arXiv
preprint arXiv:1511.06434.
Reed, S. E., Akata, Z., Yan, X., Logeswaran, L., Schiele,
B., and Lee, H. (2016). Generative Adversarial Text
to Image Synthesis. ArXiv, abs/1605.05396.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid,
I., and Savarese, S. (2019). Generalized Intersection
over Union: A Metric and a Loss for Bounding Box
Regression. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 658–666.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
VISAPP 2023 - 18th International Conference on Computer Vision Theory and Applications
194