new, studies that address this topic in urban
environments, breaking down the problem to a
neighborhood wellbeing study, are not common in
literature. Some studies focusing on urban areas have
been carried out, but they generally analyze the whole
city or use broad spatial subdivisions. In the PULSE
context, we seek to study public health problems in
urban contexts at a fine spatial resolution, considering
all the characteristics of each single neighborhood.
Spatial enablement methods are promising both
for analysis and visualization matters, but their
application is generally quite complicated due to a
diffuse lack of standards and regularization in the data
collection process. Data regarding demographic,
socioeconomics, environment and air pollution, when
available, is often collected by different public and
private entities, that apply different collection
procedures and storage standards, making it long and
uneasy to retrieve and process all the data.
In this paper, we present as case study a series of
analyses we carried out within the PULSE project
using almost uniquely open data, in particular we
applied a spatial enabled method called
Geographically Weighted Regression (GWR) to a
combination of datasets referred to New York City,
with the aim of investigating the link between asthma
hospitalizations and several socioeconomic and
environmental factors.
After a brief presentation of the methodology and
the results, explained in detail in another paper that is
currently under review, we focus also on the
difficulties that we encountered during our analyses,
highlighting the need of a better-defined system in the
data collection and storage processes in the public
health environment.
2 MATERIALS AND DATASETS
PULSE is characterized by a complex architecture
that allows an intense data flow through several
different integrated systems. The main components of
this architecture are:
The Pulsair App for smartphone, through which
users can send their data and position, and receive
personalized feedbacks concerning their
condition in relation to the situation in their city;
Backend analytics and a Decision Support
System, that apply big data methods to analyze the
input data and use predictive risk models, in order
to eventually generate feedbacks for the users;
Dashboards that allow the public health policy
makers to inspect the situation in different neigh-
borhoods and organize proper interventions;
A large and innovative WebGIS that allows to
visualize all the data on maps and quickly spot the
main features and criticalities in the studied cities.
Since the geographical description of health-related
phenomena is at the base of PULSE, the WebGIS
could be considered the most interesting architecture
element in the project, as it collects and integrates a
large wealth of spatially-enabled data.
In line with the PULSE principle, and to start
investigating its applications and extensions, we
carried out a preliminary spatial enablement study
using some open data currently integrated in the
PULSE WebGIS.
2.1 A Data Integration Example: New
York City
While the PULSE system is still in a development
phase and the WebGIS is expected to be complete by
the spring of 2019, a lot of data integration, modeling
and analysis is already being carried out with data
coming from the five cities. In particular, thanks to its
peculiar data availability, we developed a large
WebGIS prototype of New York City, and performed
some preliminary analyses on it, in order to
demonstrate the importance of spatial enablement in
studying public health in cities and the usefulness and
innovation of PULSE.
Several sources of data have been used to carry
out the analyses reported in this paper. Most of the
data has been kindly provided to the PULSE
consortium by The New York Academy of Medicine.
We used socioeconomic data freely available in the
NYC Neighborhood Health Atlas website (“New
York City Neighborhood Health Atlas,” n.d.), from
which it has been downloaded. The hospitalization
and ED visit rates data, as well as the PM2.5 historical
data, has been downloaded from the NYC
Environmental & Health Data Portal(“Environment
& Health Data Portal,” n.d.). Information regarding
age and race of hospitalized people has been acquired
from the SPARCS(“Statewide Planning and Research
Cooperative System,” n.d.) limited 2014 dataset.
2.2 Geographically Weighted
Regression
The collected datasets were analyzed through
Geographically Weighted Regression (GWR)
(McMillen, 2004), that is a linear regression model
with the addition of a weight that provides a spatial
description.