
Kingma, D. P. and Ba, J. (2015). Adam: A Method for
Stochastic Optimization. Proceedings of the Interna-
tional Conference on Learning Representations.
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata,
K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J.,
Shamma, D. A., et al. (2017). Visual Genome: Con-
necting Language and Vision Using Crowdsourced
Dense Image Annotations. International Journal of
Computer Vision, 123(1):32–73.
Miyake, R., Matsukawa, T., and Suzuki, E. Image Genera-
tion from Hyper Scene Graph with Multi-Types of Tri-
nomial Hyperedges. special issue of the Springer Na-
ture Computer Science Journal (under submission).
Miyake, R., Matsukawa, T., and Suzuki, E. (2023). Image
Generation from a Hyper Scene Graph with Trinomial
Hyperedges. In Proceedings of the 18th International
Joint Conference on Computer Vision, Imaging and
Computer Graphics Theory and Applications (VISI-
GRAPP 2023) - Volume 5: VISAPP, pages 185–195.
Odena, A., Olah, C., and Shlens, J. (2017). Conditional
Image Synthesis with Auxiliary Classifier GANs. In
Proceedings of the International Conference on Ma-
chine Learning, pages 2642–2651. PMLR.
Reed, S. E., Akata, Z., Yan, X., Logeswaran, L., Schiele,
B., and Lee, H. (2016). Generative Adversarial Text
to Image Synthesis. ArXiv, abs/1605.05396.
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid,
I., and Savarese, S. (2019). Generalized Intersection
over Union: A Metric and a Loss for Bounding Box
Regression. In Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition,
pages 658–666.
Rockwell, C., Fouhey, D. F., and Johnson, J. (2021). Pixel-
synth: Generating a 3D-Consistent Experience from
a Single Image. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages
14104–14113.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.,
Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bern-
stein, M., et al. (2015). ImageNet Large Scale Vi-
sual Recognition Challenge. International Journal of
Computer Vision, 115(3):211–252.
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V.,
Radford, A., and Chen, X. (2016). Improved Tech-
niques for Training GANs. Advances in Neural Infor-
mation Processing Systems, 29.
Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., and Man-
ning, C. D. (2015). Generating Semantically Precise
Scene Graphs from Textual Descriptions for Improved
Image Retrieval. In Proceedings of the Fourth Work-
shop on Vision and Language, pages 70–80.
Sortino, R., Palazzo, S., and Spampinato, C. (2023).
Transformer-based Image Generation from Scene
Graphs. Computer Vision and Image Understanding,
vo.233.
Sun, W. and Wu, T. (2019). Image Synthesis From Recon-
figurable Layout and Style . In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 10531–10540.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S.,
Anguelov, D., Erhan, D., Vanhoucke, V., and Rabi-
novich, A. (2015). Going Deeper with Convolutions.
In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 1–9.
Tang, R., Du, M., Li, Y., Liu, Z., Zou, N., and Hu, X.
(2021). MitigatingGender Bias in Captioning Sys-
tems. In Proceedings of the Web Conference 2021,
pages 633–645.
Van Den Oord, A., Vinyals, O., et al. (2017). Neural Dis-
crete Representation Learning. Advances in Neural
Information Processing Systems, 30.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones,
L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.
(2017). Attention is All You Need. Advances in Neu-
ral Information Processing Systems, 30.
Vo, D. M. and Sugimoto, A. (2020). Visual-Relation Con-
scious Image Generation from Structured-Text. In
Proceedings of European Conference on Computer
Vision.
Wang, Z., Jiang, Y., Zheng, H., Wang, P., He, P., Wang,
Z., Chen, W., and Zhou, M. (2023). Patch diffusion:
Faster and More Data-Efficient Training of Diffusion
Models. arXiv preprint arXiv:2304.12526.
Yang, L., Huang, Z., Song, Y., Hong, S., Li, G.,
Zhang, W., Cui, B., Ghanem, B., and Yang, M.-H.
(2022). Diffusion-Based Scene Graph to Image Gen-
eration with Masked Contrastive Pre-Training. ArXiv,
abs/2211.11138.
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang,
X., and Metaxas, D. N. (2017). StackGAN: Text to
Photo-Realistic Image Synthesis with Stacked Gen-
erative Adversarial Networks. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 5907–5915.
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X.,
and Metaxas, D. N. (2018). Stackgan++: Realistic
Image Synthesis with Stacked Generative Adversarial
Networks. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 41(8):1947–1962.
Zhang, K., Shinden, H., Mutsuro, T., and Suzuki, E. (2022).
Judging Instinct Exploitation in Statistical Data Ex-
planations Based on Word Embedding. In Proceed-
ings of the 2022 AAAI/ACM Conference on AI, Ethics,
and Society, pages 867–879.
Zheng, G., Zhou, X., Li, X., Qi, Z., Shan, Y., and Li,
X. (2023). LayoutDiffusion: Controllable Diffusion
Model for Layout-to-Image Generation. IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR), pages 22490–22499.
Image Generation from Hyper Scene Graphs with Trinomial Hyperedges Using Object Attention
279