oxide, ozone, PM10 (Particulate Matter), PM2.5,
TSP (Total Suspended Particulate). The station
calculates every hour the mean of pollutant
concentration and sends it to a control centre, where
the data are manually validated to filter outliers and
further aggregated to obtain a daily measure.
These data are publicly available at the web site
of the environment protection agency of the
Lombardy region (ARPA Lombardia); they can be
downloaded as CSV files at the url:
www.arpalombardia.it/qaria/doc_RichiestaDati.asp.
Most of the data are available from the 80s-90s, but
for many stations they are available only after year
2000, in one case from 2007.
The series of data contains missing values, since
some stations have been under maintenance for
months or have been switched off. In building our
decision models we selected the time series with less
missing values, since not all the data series could be
analyzed together for long time periods. The data
have been downloaded and imported into a CSV file.
Pollution Concentration (from Lenvis services)
In Lenvis are developed a set of services that given
the actual concentration of air pollutants, weather
conditions and geographical locations of the emission
sources allows forecasting the pollution
concentrations in a given area for the next days. The
data produced by these services are used by our
system as soft sensors, i.e. virtual sensors that
produce data for the future. Based on this data, we
can provide health impact forecast for a longer period.
Admission Data
The health indicator that we address is the daily
number of hospitalizations in the city of Milan for
two principal classes of disease that can be related to
air pollution: respiratory (asthma) and cardio
vascular (myocardial infarction, ischemic heart and
deep vein thrombosis). In these data each patient is
characterized by his associated ID and his diagnosis.
The number of hospitalizations for each pathology
are collected by the local government of the
Lombardy region and published on
http://www.aleeao.it/. For the setup and tuning of
health models we obtained specific data from two
important general hospitals accredited by the
National Health Service, one with general audience
and one with focus on geriatrics patients. These two
hospitals have provided a small quantity of very
detailed data, including also ages of the patients, sex,
detailed diagnosis, which allows to perform class-
specific analysis (for e.g. age classes). Data have
been downloaded and imported into a CSV file.
2.2 Integration Layer
Input data are collected through web services and
local databases. The DAC is a software component
that provides integrated access to multiple,
heterogeneous and distributed data sources. Its
functionalities are: object-oriented representation of
the domain data through a meta-model definition of
the types of data supported; submission of structured
queries; return of the query result as a searchable
and navigable data structure. It allows executing
cross-source queries on temporal, spatial and logical
intervals, supporting data analysis and presentation
activities. Each query, despite the traditional SQL
syntax, specifies a target but not the data sources
from which to extract the data.
The platform defines the data types that the data
sources can provide. Each data type is a structured
object, containing multiple fields (e.g. for a sensor
reading, the source of the data, the value collected,
the timestamp to which it refers etc.) and it is linked
to other data types through hierarchical relations.
Each query also specifies the constraints on results
(temporal intervals, values of some attributes…).
This querying mechanism hides the
heterogeneity and distribution of the data sources.
Moreover, it is responsibility of the DAC to identify,
select and query all the sources needed to reply to
user requests and also to prepare and present the
output in the common format. A fundamental feature
is the possibility to reply to queries not only
accessing to persistent data but also using streaming
data produced online by sensors (e.g. sensor
networks) or by the models which perform online
inference (forecast).
The main components of the DAC are: the query
processor that takes as input a complex query
formulated by the user and produces a set of simpler
queries, each of which can be satisfied by a single
wrapper. The query processor analyzes also the final
conditions that have to be satisfied before to return to
the user the results produced by the Data Aggregator.
The Wrapper component is able to manage wrappers
belong to different classes; each class is specific for a
type of data source (relational database, web service,
text file...). A wrapper processes a simple query, it is
connected to a data source and extracts from it the
data needed to reply to the query, implementing
specific protocols of the source. Finally, Data
aggregator component merges the partial results
produced by different wrappers. Analogously to
relational databases, this merging can consist in set
union or join.
ICEIS 2010 - 12th International Conference on Enterprise Information Systems
286