automatically generated using an automated seed
value.
For the monitoring station dimension first we
needed two group regions in two, first group regions
that covered several states, second group regions
present inside the borders of only one state. The
same was to be done with watershed , according to
the region they belonged to, and assign them to the
specific group of its parenting region, or the state
inside its parenting region. Next was to divide the
monitoring stations in groups of those belonging to a
watershed, those not belonging to a watershed but
marked inside a region which would be subdivided
accordingly to the extent of the region across states ,
and finally those not belonging to regions or
watershed and assign them to the state hey were
located. Once it was done we created the entity that
will identify each group, and copied each group into
the monitoring station dimension table. Next step
was to migrate the data from the database to the data
warehouse, however some stations didn’t have
values for all weather variables, even after the zero
and null replacement. One station could have
records for temperature, storm and blizzard, but not
for rain, evaporation and fog. Other station didn’t
have values for complete months of years in any
variable. The missing values would be inserted as
null, marking them not to participate in grouping
operations.
4 RESULTS OBTAINED
The original database occupied around 600 MB in
space, contained in average 726,000 records per
variable table. The data warehouse contained 5,551
records for the monitoring station dimension, 14,975
records for the time dimension and 27,722,823
records in the fact table. The data warehouse
occupied 6,886 MB in space.
Group information was easier to obtain, queries
in SQL where easier to create, and the execution
time was dramatically reduced, for example a query
to obtain the average rain in a region per moth, in
the old database take about 12 minutes, the same
query on the data warehouse only a few seconds.
About the information analysis, through a data
visualization tool we were able to see and identify
some trends in several regions and watersheds. For
example in Lerma-Santiago watershed, the main
raining season occurs from May to September and in
the average temperature at 8 a.m. shows a really
constant curve for a decade.
5 CONCLUSIONS
In this paper a data warehouse to store climatic
Mexico variables was presented. Examples of graphs
evidenced that, using the data warehouse paradigm it
is easier to visualize information (through graphs),
and to navigate over data (using the dimension
hierarchies) in comparison to the traditional database
approach.
One of the big advantages observed in exploiting
the mentioned data warehouse is that complex
grouping queries required using a database
paradigm, reduce to moving through the hierarchies
directed acyclic graphs. In this way the data
warehouse information can be visualized in multiple
ways, enabling the users to identify and to confirm:
a) trends, b) patterns, and c) hypothesis, about
Mexico’s weather variables. The knowledge derived
is a key resource to decision making processes for
every season, watershed, region or state.
The main findings enable the identification of
dominant weather conditions, which could de useful
for planning agricultural strategies according to
climate conditions.
The dimensional model proved to be able to
contain the data warehouse information for a query
intensive application while still being capable of
growth with data from other time periods or
different climatic variables without any major future
development.
REFERENCES
Agrawal, R., Gupta A., Sarawagi S., 1995. Modeling
Multidimensional Databases. In Proc. 13th Int. Conf.
Data Engineering, (ICDE). IEEE Computer Society.
Kimball, R., Reeves L., Ross M., Thornthwaite, W., 1998.
The Data Warehouse Lifecycle Toolkit: expert
methods for designing, developing, and deploying data
warehouses, John Wiley & Sons Inc. USA.
Vassiliadis, P., 1998. Modeling Multidimensional
Databases, Cubes and Cube Operations. In Software
agents activities. In Proc. 10th Int. Conference on
Statistical and Scientific Database Management
(SSDBM).
A DATA WAREHOUSE FOR WEATHER INFORMATION: A Pattern recognition solution for climatic conditions in
México
565