
Mou, C., Wang, X., Xie, L., Wu, Y., Zhang, J., Qi, Z., Shan,
Y., and Qie, X. (2023). T2i-adapter: Learning adapters
to dig out more controllable ability for text-to-image
diffusion models.
Nguyen, Q. H., Vu, T. T., Tran, A. T., and Nguyen, K.
(2023). Dataset diffusion: Diffusion-based synthetic
data generation for pixel-level semantic segmentation.
In Thirty-seventh Conference on Neural Information
Processing Systems.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E., De-
Vito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019).
Pytorch: An imperative style, high-performance deep
learning library. In Advances in Neural Information
Processing Systems, volume 32.
Podell, D., English, Z., Lacey, K., Blattmann, A., Dock-
horn, T., M
¨
uller, J., Penna, J., and Rombach, R.
(2024). Sdxl: Improving latent diffusion models for
high-resolution image synthesis. In The Twelfth Inter-
national Conference on Learning Representations.
Ranftl, R., Lasinger, K., Hafner, D., Schindler, K., and
Koltun, V. (2022). Towards robust monocular depth
estimation: Mixing datasets for zero-shot cross-
dataset transfer. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence, 44(3).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and
Ommer, B. (2022). High-resolution image synthesis
with latent diffusion models. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR), pages 10684–10695.
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C. W.,
Wightman, R., Cherti, M., Coombes, T., Katta, A.,
Mullis, C., Wortsman, M., Schramowski, P., Kun-
durthy, S. R., Crowson, K., Schmidt, L., Kaczmar-
czyk, R., and Jitsev, J. (2022). LAION-5b: An open
large-scale dataset for training next generation image-
text models. In Thirty-sixth Conference on Neural
Information Processing Systems Datasets and Bench-
marks Track.
Su, Z., Liu, W., Yu, Z., Hu, D., Liao, Q., Tian, Q.,
Pietik
¨
ainen, M., and Liu, L. (2021). Pixel differ-
ence networks for efficient edge detection. In 2021
IEEE/CVF International Conference on Computer Vi-
sion (ICCV), pages 5097–5107.
Trabucco, B., Doherty, K., Gurinas, M. A., and Salakhut-
dinov, R. (2024). Effective data augmentation with
diffusion models. In The Twelfth International Con-
ference on Learning Representations.
Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu,
L., Zhao, R., and Le, X. (2022). Semi-supervised se-
mantic segmentation using unreliable pseudo labels.
In Proceedings of the IEEE/CVF International Con-
ference on Computer Vision and Pattern Recognition
(CVPR).
Wu, W., Dai, T., Huang, X., Ma, F., and Xiao, J.
(2024). Gpt-prompt controlled diffusion for weakly-
supervised semantic segmentation.
Xie, S. and Tu, Z. (2015). Holistically-nested edge detec-
tion. In 2015 IEEE International Conference on Com-
puter Vision (ICCV), pages 1395–1403.
Yang, B., Gu, S., Zhang, B., Zhang, T., Chen, X., Sun, X.,
Chen, D., and Wen, F. (2023). Paint by example:
Exemplar-based image editing with diffusion mod-
els. In 2023 IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR), pages 18381–
18391.
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F.,
Madhavan, V., and Darrell, T. (2020). Bdd100k: A
diverse driving dataset for heterogeneous multitask
learning. In IEEE/CVF Conference on Computer Vi-
sion and Pattern Recognition (CVPR).
Zhang, L., Rao, A., and Agrawala, M. (2023). Adding con-
ditional control to text-to-image diffusion models. In
2023 IEEE/CVF International Conference on Com-
puter Vision (ICCV), pages 3813–3824.
Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. (2017).
Pyramid scene parsing network. In Proceedings of
the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR).
Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso,
A., and Torralba, A. (2019). Semantic understanding
of scenes through the ade20k dataset. International
Journal of Computer Vision, 127(3):302–321.
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
262