
vision and pattern recognition (CVPR), pages 9799–
9808.
Li, C. and Wand, M. (2016). Precomputed real-time texture
synthesis with markovian generative adversarial net-
works. In Computer Vision–ECCV 2016: 14th Euro-
pean Conference, Amsterdam, The Netherlands, Octo-
ber 11-14, 2016, Proceedings, Part III 14, pages 702–
716. Springer.
Li, T., Chang, H., Mishra, S., Zhang, H., Katabi, D., and
Krishnan, D. (2023). Mage: Masked generative en-
coder to unify representation learning and image syn-
thesis. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
2142–2152.
Li, X., He, H., Li, X., Li, D., Cheng, G., Shi, J., Weng,
L., Tong, Y., and Lin, Z. (2021). Pointflow: Flowing
semantics through points for aerial image segmenta-
tion. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
4217–4226.
Liang, Y., Li, X., Tsai, B., Chen, Q., and Jafari, N. (2023).
V-floodnet: A video segmentation system for urban
flood detection and quantification. Environmental
Modelling & Software, 160:105586.
Lin, G., Milan, A., Shen, C., and Reid, I. (2017a). Re-
finenet: Multi-path refinement networks for high-
resolution semantic segmentation. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 1925–1934.
Lin, T.-Y., Doll
´
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017b). Feature pyramid networks
for object detection. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion (CVPR), pages 2117–2125.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierarchi-
cal vision transformer using shifted windows. In Pro-
ceedings of the IEEE/CVF International Conference
on Computer Vision (ICCV), pages 10012–10022.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE/CVF conference on com-
puter vision and pattern recognition (CVPR), pages
3431–3440.
Luo, Z., Gustafsson, F. K., Zhao, Z., Sj
¨
olund, J., and Sch
¨
on,
T. B. (2023). Refusion: Enabling large-size realis-
tic image restoration with latent-space diffusion mod-
els. In Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pages 1680–
1691.
Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bring-
mann, O., Ecker, A. S., Bethge, M., and Brendel, W.
(2019). Benchmarking robustness in object detection:
Autonomous driving when winter is coming. arXiv
preprint arXiv:1907.07484.
Muandet, K., Balduzzi, D., and Sch
¨
olkopf, B. (2013). Do-
main generalization via invariant feature representa-
tion. In International conference on machine learning,
pages 10–18. PMLR.
Peebles, W. and Xie, S. (2023). Scalable diffusion models
with transformers. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages
4195–4205.
Pi, Y., Nath, N. D., and Behzadan, A. H. (2020). Convo-
lutional neural networks for object detection in aerial
imagery for disaster response and recovery. Advanced
Engineering Informatics, 43:101009.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and
Chen, M. (2022). Hierarchical text-conditional im-
age generation with clip latents. arXiv preprint
arXiv:2204.06125, 1(2):3.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and
Ommer, B. (2022). High-resolution image synthesis
with latent diffusion models. In Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition, pages 10684–10695.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th
international conference, Munich, Germany, October
5-9, 2015, proceedings, part III 18, pages 234–241.
Springer.
Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J. D., Bre-
itkopf, U., and Jung, J. (2014). Results of the isprs
benchmark on urban object detection and 3d building
reconstruction. ISPRS journal of photogrammetry and
remote sensing, 93:256–271.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and
Ganguli, S. (2015). Deep unsupervised learning us-
ing nonequilibrium thermodynamics. In Bach, F. and
Blei, D., editors, Proceedings of the 32nd Interna-
tional Conference on Machine Learning, volume 37
of Proceedings of Machine Learning Research, pages
2256–2265, Lille, France. PMLR.
Sohn, K., Jiang, L., Barber, J., Lee, K., Ruiz, N., Krish-
nan, D., Chang, H., Li, Y., Essa, I., Rubinstein, M.,
et al. (2024). Styledrop: Text-to-image synthesis of
any style. Advances in Neural Information Process-
ing Systems, 36.
Song, J., Meng, C., and Ermon, S. (2020). De-
noising diffusion implicit models. arXiv preprint
arXiv:2010.02502.
Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021).
Segmenter: Transformer for semantic segmentation.
In Proceedings of the IEEE/CVF international con-
ference on computer vision, pages 7262–7272.
Sun, T., Segu, M., Postels, J., Wang, Y., Van Gool, L.,
Schiele, B., Tombari, F., and Yu, F. (2022). Shift:
a synthetic driving dataset for continuous multi-task
domain adaptation. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 21371–21382.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and
Abbeel, P. (2017). Domain randomization for transfer-
ring deep neural networks from simulation to the real
world. In 2017 IEEE/RSJ international conference on
intelligent robots and systems (IROS), pages 23–30.
IEEE.
LAST: Utilizing Synthetic Image Style Transfer to Tackle Domain Shift in Aerial Image Segmentation
41