Enhancing Image Generation with Diffusion Transformer Architecture
Ruiyang Wu
2024
Abstract
In image generation tasks, this study aims to explore the advantages and potential of a fusion model that integrates transformer and diffusion models. Specifically, research propose a novel diffusion Diffusion Transformers (DiT) architecture, where the transformer model is incorporated into a diffusion probability model for image generation. This architecture replaces the U-Net backbone in traditional diffusion models, harnessing the transformer's robust sequence modelling and long-range dependency capture capabilities. By employing a "patchify" layer to convert images into token sequences, followed by processing through the transformer block and decoder, the DiT architecture transforms the input into the desired output format. The experimentation conducted on the ISLVRC2012 dataset, a lightweight version of ImageNet, demonstrates that DiT outperforms other generation models in key image generation quality indicators such as Frechet Inception Distance and Inception Score. These results underscore the model's prowess in generating high-quality images efficiently. The proposed DiT architecture amalgamates the strengths of transformer and diffusion models, offering enhanced image generation quality and processing efficiency. Despite encountering challenges, this framework paves the way for advancements in multimodal learning, reinforcement learning, and the development of controllable and interpretable generative models.
DownloadPaper Citation
in Harvard Style
Wu R. (2024). Enhancing Image Generation with Diffusion Transformer Architecture. In Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI; ISBN 978-989-758-713-9, SciTePress, pages 337-342. DOI: 10.5220/0012937800004508
in Bibtex Style
@conference{emiti24,
author={Ruiyang Wu},
title={Enhancing Image Generation with Diffusion Transformer Architecture},
booktitle={Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI},
year={2024},
pages={337-342},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012937800004508},
isbn={978-989-758-713-9},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 1st International Conference on Engineering Management, Information Technology and Intelligence - Volume 1: EMITI
TI - Enhancing Image Generation with Diffusion Transformer Architecture
SN - 978-989-758-713-9
AU - Wu R.
PY - 2024
SP - 337
EP - 342
DO - 10.5220/0012937800004508
PB - SciTePress