vision and pattern recognition (CVPR), pages 9799–
Li, C. and Wand, M. (2016). Precomputed real-time texture
synthesis with markovian generative adversarial net-
works. In Computer Vision–ECCV 2016: 14th Euro-
pean Conference, Amsterdam, The Netherlands, Octo-
ber 11-14, 2016, Proceedings, Part III 14, pages 702–
716. Springer.
Li, T., Chang, H., Mishra, S., Zhang, H., Katabi, D., and
Krishnan, D. (2023). Mage: Masked generative en-
coder to unify representation learning and image syn-
thesis. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
Li, X., He, H., Li, X., Li, D., Cheng, G., Shi, J., Weng,
L., Tong, Y., and Lin, Z. (2021). Pointflow: Flowing
semantics through points for aerial image segmenta-
tion. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages
Liang, Y., Li, X., Tsai, B., Chen, Q., and Jafari, N. (2023).
V-floodnet: A video segmentation system for urban
flood detection and quantification. Environmental
Modelling & Software, 160:105586.
Lin, G., Milan, A., Shen, C., and Reid, I. (2017a). Re-
finenet: Multi-path refinement networks for high-
resolution semantic segmentation. In Proceedings of
the IEEE conference on computer vision and pattern
recognition, pages 1925–1934.
Lin, T.-Y., Doll
ar, P., Girshick, R., He, K., Hariharan, B.,
and Belongie, S. (2017b). Feature pyramid networks
for object detection. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion (CVPR), pages 2117–2125.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin,
S., and Guo, B. (2021). Swin transformer: Hierarchi-
cal vision transformer using shifted windows. In Pro-
ceedings of the IEEE/CVF International Conference
on Computer Vision (ICCV), pages 10012–10022.
Long, J., Shelhamer, E., and Darrell, T. (2015). Fully con-
volutional networks for semantic segmentation. In
Proceedings of the IEEE/CVF conference on com-
puter vision and pattern recognition (CVPR), pages
Luo, Z., Gustafsson, F. K., Zhao, Z., Sj
olund, J., and Sch
T. B. (2023). Refusion: Enabling large-size realis-
tic image restoration with latent-space diffusion mod-
els. In Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pages 1680–
Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bring-
mann, O., Ecker, A. S., Bethge, M., and Brendel, W.
(2019). Benchmarking robustness in object detection:
Autonomous driving when winter is coming. arXiv
preprint arXiv:1907.07484.
Muandet, K., Balduzzi, D., and Sch
olkopf, B. (2013). Do-
main generalization via invariant feature representa-
tion. In International conference on machine learning,
pages 10–18. PMLR.
Peebles, W. and Xie, S. (2023). Scalable diffusion models
with transformers. In Proceedings of the IEEE/CVF
International Conference on Computer Vision, pages
Pi, Y., Nath, N. D., and Behzadan, A. H. (2020). Convo-
lutional neural networks for object detection in aerial
imagery for disaster response and recovery. Advanced
Engineering Informatics, 43:101009.
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and
Chen, M. (2022). Hierarchical text-conditional im-
age generation with clip latents. arXiv preprint
arXiv:2204.06125, 1(2):3.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and
Ommer, B. (2022). High-resolution image synthesis
with latent diffusion models. In Proceedings of the
IEEE/CVF conference on computer vision and pattern
recognition, pages 10684–10695.
Ronneberger, O., Fischer, P., and Brox, T. (2015). U-
net: Convolutional networks for biomedical image
segmentation. In Medical image computing and
computer-assisted intervention–MICCAI 2015: 18th
international conference, Munich, Germany, October
5-9, 2015, proceedings, part III 18, pages 234–241.
Rottensteiner, F., Sohn, G., Gerke, M., Wegner, J. D., Bre-
itkopf, U., and Jung, J. (2014). Results of the isprs
benchmark on urban object detection and 3d building
reconstruction. ISPRS journal of photogrammetry and
remote sensing, 93:256–271.
Simonyan, K. and Zisserman, A. (2014). Very deep con-
volutional networks for large-scale image recognition.
arXiv preprint arXiv:1409.1556.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., and
Ganguli, S. (2015). Deep unsupervised learning us-
ing nonequilibrium thermodynamics. In Bach, F. and
Blei, D., editors, Proceedings of the 32nd Interna-
tional Conference on Machine Learning, volume 37
of Proceedings of Machine Learning Research, pages
2256–2265, Lille, France. PMLR.
Sohn, K., Jiang, L., Barber, J., Lee, K., Ruiz, N., Krish-
nan, D., Chang, H., Li, Y., Essa, I., Rubinstein, M.,
et al. (2024). Styledrop: Text-to-image synthesis of
any style. Advances in Neural Information Process-
ing Systems, 36.
Song, J., Meng, C., and Ermon, S. (2020). De-
noising diffusion implicit models. arXiv preprint
Strudel, R., Garcia, R., Laptev, I., and Schmid, C. (2021).
Segmenter: Transformer for semantic segmentation.
In Proceedings of the IEEE/CVF international con-
ference on computer vision, pages 7262–7272.
Sun, T., Segu, M., Postels, J., Wang, Y., Van Gool, L.,
Schiele, B., Tombari, F., and Yu, F. (2022). Shift:
a synthetic driving dataset for continuous multi-task
domain adaptation. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion, pages 21371–21382.
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and
Abbeel, P. (2017). Domain randomization for transfer-
ring deep neural networks from simulation to the real
world. In 2017 IEEE/RSJ international conference on
intelligent robots and systems (IROS), pages 23–30.
LAST: Utilizing Synthetic Image Style Transfer to Tackle Domain Shift in Aerial Image Segmentation