
2.3 Related Literature: Learning
Methods
Unlike procedural methods, existing papers on this
topic using deep learning or other learning methods
mainly focus on generating city layouts, and less so
on generating the buildings. This section will focus
on the former aspect.
Most papers proposing a learning method uses
some form of neural network. Zhang et al proposed
MetroGAN (Weiyu Zhang, 2022) for generating a
satellite imagery-like morphology of cities by split-
ting the geographical features into different condi-
tions for the generator. However, the end results
largely contain road networks and little else, making
them unsuitable for post-processing in a 3D program.
Albert et al. (Albert et al., 2018) similarly generated
urban patterns using GANs, resulting in a heatmap-
like image of the buildings. This mainly captures the
shapes of building densities across a city, and is in fact
evaluated with building density patterns, but the re-
sulting images don’t have detailed building footprints.
Bachl et al. (Bachl and Ferreira, 2020) proposed a
learning method for urban styles when viewed at the
ground level, performing a style transfer task to gen-
erate a city with textures of another overlaid on top.
Song et al.(Jieqiong Song, 2021) proposed MapGen-
GAN, which focused on generating map-like images
from real satellite images, but did not generate new
cities. Shen et al. (Jiaqi Shen, 2020) performed a
style-transfer task, where an input road network was
filled with building blocks. They used GAN for the
task, and trained the generator on a dataset of real
cities in China with erased building footprints. The
generator outputs a map with the same road network
as the input data, but with building footprints filled
in. The discriminator is then trained to distinguish
between that output and the ground truth. There was
no road generation component to this paper, however.
Similarly, Fedorova (Fedorova, 2021) tackled a simi-
lar task, but instead of filling in all the buildings, they
focused on generating a missing block. Maps of real
cities with one block removed serves as the input, and
the output is that same city, but with the missing block
filled. This paper was highly focused on getting the
shapes of that generated block right, based on popu-
lation density of the surrounding areas.
2.4 Current Limitations
As discussed, most papers are focused on generating
city layouts, with the main features being roads and
buildings. However, none of the papers covered gen-
erating features such as rivers, lakes, and parks. In ad-
Figure 4: Samples captured from London, UK.
dition, most of the past research were not made with
the intent to be converted into a 3D model, which re-
sulted in layouts with highly detailed roads, but not
detailed building footprints, or vice versa. Many pro-
cedurally generated road networks did not include any
building generation, and simply assumed the negative
space between the roads to be filled with buildings.
3 DATA COLLECTION AND
PRE-PROCESSING
3.1 Data Source
Not many databases on the internet collect detailed
data about cities around the world. Among the
most famous, open-sourced ones is OpenStreetMap
(OSM). With a well built API for accessing their
database, we decided on building our dataset from
them.
We sampled rectangular snippets of cities of
2x2km
2
area. The features we collected are the
road network, rivers, lakes, canals, ocean, natural
parks, meadows, and building footprints, which were
grouped into landed housing, apartments, and com-
mercials.
As for our cities, we selected from a list of top 700
most populated cities in the world, a list of cities with
at least 1 million population, a list of biggest North
American cities, and a list of biggest European cities.
We collected five snippets from each city. 1
Collecting five snippets from over two thousand
cities gave us roughly ten thousand samples before
cleaning out samples that contained nothing, or only
a few roads. After cleaning, our biggest dataset is at
6591 samples.
3.2 Pre-Processing
OpenStreetMap’s API allows us to extract each urban
feature individually. We then color-coded them onto
a black canvas and stored them into 256x256 images.
The colors are as described in Table 1. We chose each
color based on what we inferred would be easily sep-
arated with computer vision methods.
GRAPP 2024 - 19th International Conference on Computer Graphics Theory and Applications
214