Schemes helping to solve semantic heterogeneities
(out of the scope of this paper). In the definition of
the DW scheme, the DW Administrator participates
in order to contemplate the characteristics of
structuring and storage of the data in the DW.
Three modules have been added to the reference
architecture in order to carry out the integration of
the data, considering the data extraction method
used:
- The Temporal Integration Processor uses the
set of semantic relations and the conformed schemes
obtained during the detection phase of similarities.
As a result, we obtain data in form of rules about the
integration possibilities existing between the
originating data from the data sources (minimum
granularity...). This information is kept in the
Temporal Metadata Warehouse. In addition, as a
result of the Temporal Integration process, a set of
mapping functions is obtained.
- The Spatial Integration Processor does the
necessary transformations of the spatial data in data
sources in order to integrate them. It is necessary to
convert all of data to the same format and unit of
measurement. It is also responsible of dealing with
the different spatial granularity of data we can find
in different data sources.
- The Metadata Refreshment Generator
determines the most suitable parameters to carry out
the refreshment of data in the DW scheme. The DW
scheme is generated in the resolution phase of the
methodology of integration of schemes of data. It is
in this second phase where, from the minimum
requirements generated by the temporal integration
and stored in the Temporal Metadata warehouse, the
DW designer fixes the refreshment parameters. As
result, the DW scheme is obtained along with the
Refreshment Metadata necessary to update the
former according to the data extraction method and
other temporal properties of a concrete data source.
Data Warehouse Refreshment. After temporal
integration and once the DW scheme is obtained, its
maintenance and update will be necessary. This
function is carried out by the DW Refreshment
Processor. Taking both the minimum requirements
that are due to fulfill the requirements to carry out
integration between two data of different data
sources (obtained by means of the Temporal
Integration module) and the integrated scheme
(obtained by the resolution module) the refreshment
parameters of the data stored in the DW will be
adjusted.
3 EXAMPLE
A Decision Support System (DSS) being based on a
DW is presented as an example (fig. 2). This can be
offered by Small and Medium-Sized Enterprises
(SMEs) as a plus for adventure tourism. Here, a DSS
is used to assist novel and expert pilots in the
decision-making process for a soaring trip (Araque
et al., 2006b). These pilots depend to a large extent
on meteorological conditions to carry out their
activity and an important part of the system is
responsible for handling this information. Two web
data sources are mainly used to obtain this kind of
information:
The US National Weather Service Website. We
can access weather measurements (temperature,
pressure, humidity, etc) in every airport in the
world. In Spain we can find 48 airports where
we can extract this information.
In order to obtain a more detailed analysis and to
select the best zone to fly, pilots can access to
the Spanish National Weather Institute (INM)
website. There are 205 meteorological stations
distributed along the Spanish surface. They are
usually refreshed every thirty minutes.
The continuous integration of Web data sources
may result in a collapse of the resources of the
SMEs, which are not designed to support the
laborious task of maintaining a DW up to date.
In our approach, the DW administrator
introduces the data sources temporal properties in
DETC tool and selects the parameters to integrate,
for example the temperature. This tool is able to
determine the maximum level of detail (granularity).
We find out that in the second source, the
information about the temperature can be precise
with a detail of “minute” (for example, that at 14
hours and 30 minutes there were a temperature of
15ºC), whereas in the first case it talks about the
temperature with a detail of “hour” (for example,
that at 14 hours there were 15ºC).
It can also determine the time intervals in which
this information is available to be queried (useful
when dealing with other kind of data sources).
Applying the temporal algorithms, out of the
scope of this paper, we would obtain all possible
instants of querying which both sources are
accessible at, so the extraction and integration
process can be performed (Araque et al., 2006a).
Let us suppose that both data sources in this
example are always available for querying. The
DWA, who usually wants to get the most detailed
information, would select to extract the changes
from the first data source every hour and every half
an hour in the case of the second one. There is a
waste of resources in this approach.
A DATA WAREHOUSE ARCHITECTURE FOR INTEGRATING FIELD-BASED DATA
579