In this paper we examine cross view image transla-
tion, generating a street view from the correspond-
ing aerial view using a cascade pipeline, where coarse
street view image generation, semantic segmentation,
and image refinement, are combined and trained to-
gether . We tested SoA generator models U-Net,
ResNet, and ResU-Net++ and found best results were
obtained for the configuration (Generator 1: U-Net,
Generator 2: ResNet, Generator 3: ResU-Net++).
This demonstrates the importance of sjkip connec-
tions for street view generation and of attention for
image refinement. The role of each of the 3 subtasks
in the pipeline was studied and it was concluded that
each subtask improved overall performance qualita-
tively and quantitatively. Future work includes inves-
tigating appropriate networks for further refinement
of the output images to address artifacts related to per-
spective projection, and how to incorporate varying
sources of input data (such as aerial input data from
drones at varying heights, or video input).
Aerial to Street View Image Translation using Cascaded Conditional GANs