benefit predictive modelling in many applications,
due to a few reasons. Tobler’s First Law of Geog-
raphy (TFL), also mentioned by geographers as “spa-
tial dependency”, suggests that characteristics of near
places are contiguously correlated together. Also, it is
reasonable to think that predictive models can benefit
more from a diverse and balanced variable distribu-
tion, and by counting items within cells, having fell
samples may be a threaten for this, as we show.
To calculate PoI density, we evaluate the count-
based baseline method, which we refer here to as
the quadrat method, against kernel density estimation
(KDE) that generally produces relatively smoother
distributions. We evaluate these two by measuring
the spatial autocorrelation of the features provided
with Moran’s I, and spatial heterogeneity with the q-
index, which shall indicate how uneven features are
distributed. To visually compare these methods, we
conduct a qualitative assessment based on visual in-
spection of the spatial distribution for different sam-
ple sizes.
Also, from the best of our knowledge, we did
not find that related studies show a clear and repro-
ducible geographic feature engineering procedures,
either a piece of software that would support such
analysis. For this reason, we propose and demon-
strate Geohunter, a reproducible geographic feature
engineering framework that fetches OpenStreetMap
data and calculates the density of points-of-interest
throughout the city. We implemented it as open-
source python-package, currently with the functional-
ities of (i) loading data from the OpenStreetMap API,
(ii) parse OpenStreetMap data into commonly-used
geometric-based data structures (Geopandas (Jordahl,
2014)) and (iii) to extract geographic features from
points-of-interest with the methods used in this work.
This paper is divided as follows. In Section II, we
describe some aspects of the data source, the two geo-
graphic feature engineering methods used, our evalu-
ation approach, the evaluation approach, and provide
some details about the Geohunter. In Section III, we
show the results of our experiments, both in qualita-
tive and quantitative assessments. Finally, in Section
IV, we discuss some possible outcome analysis that
illustrates the usefulness of such geographic features.
2 DATA AND METHODS
2.1 OpenStreetMap Data
Over the last decade, WEB-based GIS technolo-
gies were created to provide reliable representations
of the urban environment. Among them, Open-
StreetMap certainly gained notoriety due to its sub-
stantial community-based contributions and because
of its open data policy. It drove the growth of the Vol-
unteered Geographic Information (VGI) culture, and
many studies, e.g. (Kounadi, 2009; Camboim et al.,
2015), have assessed the quality of information on
the platform. Nowadays, one can quickly request its
data through, for example, the Overpass
1
API. Open-
StreetMap has a particular data model to represent ob-
jects or Points-of-Interest (PoI) composed of “nodes”,
“ways” and “relations”, and each object can be cre-
ated, tagged and verified by the community. There are
a set of defined and most used tags to classify PoIs,
also referred to as “map features” (a complete de-
scription of them is provided in OpenStreetMap doc-
umentation
2
).
The Overpass API receives requests in a specific
query language (Overpass QL) and returns data con-
taining geographical coordinates and several other at-
tributes of the requested PoIs. For more details about
the structuring of these queries, we recommend the
API documentation. As mentioned, the returned ele-
ments follow the typology of (i) nodes, which defines
points in space, (ii) ways, which defines linear charac-
teristics and area boundaries, and (iii) relations, which
are used to explain how other elements work together.
An arbitrary PoI can be composed by a relation of
several ways and nodes. This typology is not directly
related to the geometric concepts of points, lines and
polygons.
The physical aspects of elements are described
by tags attached to them. Each tag is used to de-
scribe different aspects of an element, which can have
an unlimited number of tags describing them (as-
signed by the community). Furthermore, tags are
defined by a pair key:value, for instance, a church
can be represented by the tag building:church, and
also by amenity:place of worship. For this paper,
we select tags with the purpose of illustrating the
method discussed further in this paper which can in-
volve a variety of PoI. Table 1 details the keys, values
and amounts of data within the boundaries of Natal,
Brazil.
2.2 Geographic Feature Engineering
Methods
For each geographic object listed in Table 1, we ex-
tract a separate feature layer which describes density
values for places in the city. These places can be de-
fined following an administrative division of the city
1
https://wiki.openstreetmap.org/wiki/Overpass API
2
https://wiki.openstreetmap.org/wiki/Map Features
Geographic Feature Engineering with Points-of-Interest from OpenStreetMap
117