TOWARDS DATA WAREHOUSES FOR NATURAL

HAZARDS

Hicham Hajji, Mohand-Said Hacid, Hassan Badir

Claude Bernard University,LIRIS - UFR d'Informatique

43, boulevard du 11 Novembre 1918

69622 Villeurbanne cedex

France

Keywords: Data warehouse, OLAP, natural hazards management, Geographical Information Systems

Abstract: Data warehousing has emerged as an effective technique for converting data into useful information. It is an

improved approach to integrate data from multiple, often very large, distributed, heterogeneous databases

and other information sources. This paper examines the possibility of using data warehousing techniques in

the natural hazards management framework to integrate various functional and operational data which are

usually scattered across multiple, dispersed and fragmented systems. We present a conceptual data model

for the data warehouse in the presence of various data formats such as geographic and multimedia data. We

propose OLAP operations for browsing information in the data warehouse.

1 INTRODUCTION

Data warehousing is one of the most important

recent developments in the field of information

system (IS). Conceptually, a data warehouse is a

database that collects and stores data from multiple

remote and heterogeneous information sources. The

data warehouse approach presents some advantages

over the traditional on-demand approach

(Theodoratos, 1999):

- The queries can be answered locally without

accessing the

original information sources.

- On-line Analytical Processing (OLAP

) is

decoupled (separated) as much as possible from On-

line Transaction Processing (OLTP).

Data warehousing technologies have already been

success

fully deployed in many industries (in

manufacturing for order shipment and customer

support, in financial services for claims analysis, risk

analysis, in transportation for fleet management, in

telecommunications for call analysis, and in

healthcare for outcomes analysis (Adriaans, 1996).

In this paper, we propose a multidimensional data

odel for natural hazard management, and we

propose a query tool that will assist to extract

information and perform analytical operations in

such warehouse environment. To motivate our study

of spatial data warehousing, we examine the

following examples:

Example 1: par

cel-based analysis of damage

assessment. A user may like to view damage

assessment on maps by building, by parcel, or even

like to dynamically drill-down or roll-up along any

dimension to explore desired patterns, such as risky

regions regarding a specific hazards.

Example 2: o

verlay of multiple thematic maps.

There often exist multiple thematic maps in spatial

database, such as altitude map, population map,

flood intensity map etc. By overlapping thematic

maps; one may find some interesting relationships

among altitude and flood intensity.

Paper outline: The paper is organized as follows.

Sect

ion 2 will present data model sources and

proposes a model for the data warehouse. Then, we

present some interesting OLAP operations for data

analysis and exploration of the data cube. We

conclude our work in section 4 by anticipating on

necessary extensions.

415

Hajji H., Hacid M. and Badir H. (2005).

TOWARDS DATA WAREHOUSES FOR NATURAL HAZARDS.

In Proceedings of the Seventh International Conference on Enterprise Information Systems, pages 415-419

DOI: 10.5220/0002519704150419

 SciTePress

Material s

Signification

Floors materials

Signification

Maintenance Sta

Signification

2 A MULTIDIMENSIONAL

MODEL FOR ANALYSIS

Built Prop

PB_ID

Local Nature

Particular Construction

Cadastral Section

Occupation Nature

Level number

Floor level

Parcel Number

N° Local

Local Type

Total Locative Value

Address

Date of achievement

Habitat Description

Maintenance state

Mater ials

Floors materials

Number of annexes beds

Numbers of chambers

Number of beds

Number of principal beds

Beds a reas

Elevators

Central heating

Eau

Gas

PB_ID

Local Nature

Signification

Parti cular Construct ion

Signification

Nature Occupation

Signification

Local Type

Signification

Parcel

Parcel_ID

Cadastral section

SIRENE Data

Company_ID

Identification (Adress)

Compan y Id

Name

Road Type

Road Name

Geographical

Localization

Region

Departme nt

Arro ndisseme nt

Canton

Commune

postal Code

Economic characteristic

Activity(700 classes)

Turno ver ( Euros)

Effective employees

Date of creation

Seasonal character

Princip al act ivity

Sec

ndar y Activit y

Non Built Prop

PNB_ID

Cadastra l Section

Parcel cap acity

Parcel

Parcel reference

Road number

Surface Description

PNB_ID

Culture nature

Special cu lture nature

Reference parcel

Constructible terrain

Culture Nature

Signification

Commune

Commune _ID

P1(X, Y)

P2(X, Y)

P3(X, Y)

…

Pn(X, Y)

Cadastral section

Section_ID

P1(X, Y)

P2(X, Y)

P3(X, Y)

…

Pn(X, Y)

Commune_ID

Parcel

Parcel_ ID

P1(X, Y)

P2(X, Y)

P3(X, Y)

…

Pn(X, Y)

Section_ID

Non built Prop

PNB_ID

P1(X, Y)

P2(X, Y)

P3(X, Y)

…

Pn(X, Y)

Parcel _ID

Built Prop

PB_ID

P1(X, Y)

P2(X, Y)

P3(X, Y)

…

Pn(X, Y)

Parcel_ID

IRIS

IRIS_ ID

P1(X , Y)

P2(X , Y)

P3(X , Y)

…

Pn(X , Y)

IRIS Attributes

IRI S_ ID

Attr1

Attr2

Attrn

Following the trend of the development of data

warehousing and data mining techniques, we

propose to construct spatial data warehouses to

facilitate online spatial data analysis for natural

hazard management.

Spatial data can be analysed taking into account

different aspects, for example, whether the type of

the predicates and the results is spatial or non-

spatial. Moreover, another classification criterion

can be used taking into account the fact that

topological or non-topological relationships can be

the main focus of analysis. This work will focus on

representing spatial and non-spatial data in a

multidimensional model. Moreover, this work will

show that multidimensional model is more suitable

for doing analysis in hazards management.

2.1 Data Sources

According to their nature, data can fall into three

different categories: non-spatial data, spatial data,

and hazards data.

2.1.1 Non-Spatial Data

It’s mainly composed of two different sources (see

figure 1). Data for describing exposed elements such

as population and buildings data: These data are

mainly used for describing the vulnerability of

elements present on the territory. Some of these

information are: building area, date of completion,

building material, presence or not of heating, etc.

The other part of data is those describing companies

activities. In our case this is represented by the

SIRENE Data, that gives geographical and economic

information of companies

2.1.2 Spatial Data

Spatial data used in the data warehouse are

principally composed of spatial elements such as

communes, parcels, building (see figure 1). The

other components of spatial data are geological

maps, Ortho-photos,DEM (digital elevation model),

Geological Maps, Ortho-photo

Figure 1: Spartial and Non-spatial data involved in the

data warehouse

2.1.3 Hazards Data

The data describing hazards behaviors are stored nowhere,

but must be computed according to a specific hazard with

mathematical models using data sources. For example, in

the case of floods hazard (figure 2), the data needed are

digital elevation model, hydrological and hydraulic model.

DEM

Hydrologic

parameters

Hydrologic

modeling

Water elevation

Intensity

velocity

Figure 2 : Steps for computing floods hazards data

=+−

∂

⎟

⎠

⎞

⎜

⎝

⎛

∂

gAgAS

Equation 1: Flood Estimation by Saint Venant Equations

Each natural hazard can be represented by different

parameters. In the case of floods for example, the

parameters as seen above (figure 2) are water

elevation, intensity, and velocity. These parameters

differ from a hazard to another. The sources

containing those parameters have a field-based

structure, i.e. where space can be seen as continuous

field, and the information of interest (flood

parameters) is obtained at each point of space (see

equation 1 for an example for such an equation). We

can use these parameters individually or compute a

new synthesizing indicator.

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

416

2.2 Data warehouse multidimensional

model

There is a huge number of information related to

hazard analysis that are either stored or computed

from the data sources such as risk indicator, parcel

information: area, date of completion, building

material, presence or not of heating, population

information, company information, etc. A user may

like to view risk patterns on a map by building, by

parcel, by cadastral section, by elevation and by

different combinations of population and company

activities information, or even like to dynamically

drill-down or roll-up along any dimension to explore

desired patterns, such as high or low risky regions.

Spatial data can be analysed by taking into account

different aspects, for example, whether the type of

the predicates and the results is spatial or non-

spatial. Moreover, another classification criterion

can be used by taking into account the fact that

topological or non-topological relationships are or

not the main focus of analysis. In this paper, we use

the terminology usually used in multidimensional

modelling such as dimension type, fact relationship,

hierarchy, level.

2.2.1 Measures

Measures are attributes representing the specific

elements of analysis, such as Locative Value, area.

In general, they can be summed or averaged in order

to understand the particular aspects in consideration.

We distinguish two types of measures in a spatial

data warehouse (Han, 1997):

Numerical measure: a numerical measure is a

measure containing only numerical data. For

example, one measure could be total revenue of a

building, and a roll-up may give the total revenue by

parcel, by cadastral region, and so forth.

A simple example of numerical measure computing

is:

Select Sum(TotalLocativeValue)

From Built Prop

Group By ParcelId.

Numerical measures can be further classified into

distributive, algebraic, and holistic (Kimbal, 1996).

Spatial measure: A spatial measure is a measure

which contains a collection of pointers to spatial

objects. For example, during the generalization

procedure, the parcels with the same range of risk

indicator are grouped into the same cell, and the

measure so formed contains a collection of pointers

to those parcels.

2.2.2 Dimensions

Dimension is an object that includes attributes

allowing the user to explore the measures from

different perspectives of analysis. In the context of

spatial data warehouse, we distinguish three types of

dimensions according to whether or not it has spatial

references.

Non-Spatial dimension: A non-spatial dimension is a

dimension containing only non-spatial data. In our

case, risk indicator can be considered as a non-

spatial dimension. It contains non-spatial data

corresponding to risk value, whose generalization is

also non-spatial, such as low-risky, and high-risky.

Spatial-to-non-spatial dimension: A spatial-to-non-

spatial dimension is a dimension whose primitive

level data is spatial but whose generalization,

starting high level, becomes non-spatial. We will not

handle this type of dimension in our case.

Spatial Dimension: A spatial-to-spatial is a

dimension whose primitive level and all of its high

level generalized data are spatial. For example, in

our case, building, parcel, cadastral section, and

Commune are all spatial elements of the location

spatial dimension.

2.2.3 Star Model

The most known logical model used for the DW

design is called star scheme having his center

represented by a fact table surrounded with several

dimension tables forming star-like appearance. This

model is also called multidimensional model due to

the fact that several dimensions (multiple

dimensions) are used to analyze the measures. Since

the data warehouse is subject-oriented, and in

natural hazard framework, we focus our analysis on

vulnerability aspect; we will build our model with an

emphasis on vulnerability measures that can be

obtained or computed from data sources such as

total revenue, effective employees, and turnover

sales. Hence, the model (see figure 3 and 4) contains

one fact table (with vulnerability measures),

surrounded by (spatial and non spatial) dimensions

tables.

2.3 OLAP operations

With the above specified dimensions, OLAP

operations can be performed by stepping up and

down along any dimension shown in Figure 4, we

will use popular OLAP operations and analyze how

they are performed on a spatial data cube:

Slicing and dicing: each of which selects a portion of

the cube based on the constant(s) in one or a few

dimensions. For example, one may be interested

TOWARDS DATA WAREHOUSES FOR NATURAL HAZARDS

417

only in risky regions located in a particular region or

in regions of risk indicators. This can be realized by

transforming the selection criteria into a query

against the spatial data warehouse and be processed

by query processing methods (Gray, 1997).

Hazard

Natural Hazards Technological Hazards

ALL HAZARDS

floods Fires Land-slide Avalanche Chemical Nuclear

[0,25]

VM VM VM VM VM VM VM

Low Risk

(%)

[25,30]

VM VM VM VM VM VM VM

[31,45]

VM VM VM VM VM VM VM

[46,65]

VM VM VM VM VM VM VM

[66,70]

VM VM VM VM VM VM VM

Medium

Risk

(%)

[71,75]

VM VM VM VM VM VM VM

[76,85]

VM VM VM VM VM VM VM

High Risk

(%)

[86,100]

VM VM VM VM VM VM VM

Pivoting: which presents the measures in different

cross_tabular layouts. This can be implemented in a

similar way as in non-spatial data cubes. For

example, a spreadsheet table containing Risk

Indicator and Hazard category as row and columns

respectively may be presented to a user (see figure

4). The values (e.g cells) in the table may contain the

number of the vulnerable population of the

corresponding region(s)

Figure 3: Spreadsheet table presenting results of pivoting

Operations on Data Cube. Where, VM is a vulnerability

measure, such as population number

Roll-up: which generalizes one or a few dimensions

(including the removal of some dimensions when

desired) and performs appropriate aggregations in

the corresponding measure(s). For example, one

may roll-up

on hazard dimension to get summarized information.

For non-spatial measures, aggregation is

implemented in the same way as in non-spatial data

cubes (Stefanovic, 2000). It is challenging to

efficiently implement such operations since it is both

time and space consuming to compute spatial merge

or overlay and save the merged or overlaid spatial

objects. For further discussions, we refer reader to

(Bédard, 2001).

Drill-down: which specializes one or a few

dimensions and presents low-level objects,

collections, or aggregations. This can be viewed as a

reverse operation of roll-up and can often be

implemented by saving a low level cuboid and

performing appropriate generalization from it when

necessary.

The figure 4 gives an overview of possible OLAP

operations. By Providing online operations, flexible

summarizing and tabulating, the data cube can be an

essential part of the decision support process in

natural hazard management. The multidimensional

views offered by the data warehouse allow decision

makers to have multiple insights on their data

[75, 100]

Parcel 1

Parcel N

[0, 25]

Flood Landslide

Parcel 1

Parcel N

0, 25 75, 100

Pivot

0 , 25

75 , 100

Parcel 1 Parcel N

[26, 50]

[0, 25]

Landslide Flood

Location

Risk

Indicator

Natural

Hazard

Vulnerability

Measure

Roll-up on location

(from Building to

Cadastral

arcels

)

Dice for

(location = ‘Parcel A’ or ‘Parcel B’)

and

(hazard = ‘flood’ or ‘landslide’) and

(

risk indicator = ‘0-25’ or ’26-50’

Slice

For Hazard = ‘Flood’

Drill-down

on location

(From Cadastral sections to Parcels)

[75, 100]

Section 1

Section N

[0, 25]

Flood Landslide

Parcel A

Parcel B

Vulnerability Measure

Figure 4: Overview of Olap operations on data cube 4

3 CONCLUSION

In this paper, we have studied the construction of a

spatial data warehouse in the framework of natural

hazard management. The data warehouse is based on

a data cube model which consists of both spatial and

non-spatial

dimensions and measures, the cube model handles

spatial information in addition to non-spatial

information. Then, we have surveyed the main

OLAP operations to facilitate the browsing and

exploration tasks on data cube.

Natural hazards are time-occurring phenomena.

Therefore temporal dimension need to be handled in

such a framework. In addition, spatial as well as

non-spatial data can change their values over time.

Particularly, in the DWs where the data is stored for

long periods of time and where the changes to these

data cannot overwrite the already existing values,

the important consideration is how to represent the

time during which these values are valid. DWs

consider the temporal aspect in a very limited way

ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

418

by including the time dimension and offering to

represent the changes in time referring them only to

the measures.

REFERENCES

Theodoratos, D., and Sellis, T. 1999. Design data

warehouse In Data and Knowledge Engineering, Vol.

31, pp. 279-301.

Adriaans P. and D. Zantinge. 1996. Data Mining, Addison

Wesley Longman Limited, Reading, Massachusetts.

J. Widom, 1995. Research problems in data warehousing,

In Proc the Int. Conf. on Information and Knowledge

Management pages 25- 26 Baltimore, Maryland.

J. Han, K. Koperski and N. Stefanovi, 1997. GeoMiner: A

system prototype for spatial data mining. In Proc.

1997 ACM SIGMOD Int Conf Management of Data

pages 553 – 556 Tucson Arizona.

Han J., Stefanovic N., and Koperski K., 1998. Selective

Materialization: An Efficient Method for Spatial Data

Cube Construction. In Pacific-Asia Conf. on

Knowledge Discovery and Data mining, PAKDD.

Stefanovic N., Han J., and Koperski K., 2000. Object-

Based Selective Materialization for Efficient

Implementation of Spatial Data Cubes. In IEEE

Transactions on Knowledge and Data Engineering

(TKDE), 12(6).

Bédard Y., Merrett T., Han J., 2001, Fundaments of

Spatial Data Warehousing for Geographic Knowledge

Discovery. In Miller H. and Han J. (eds.) Geographic

Data Mining and knowledge discovery. Taylor &

Francis.

Shekhar S., Lu C.T., Tan S., Chawla S. and Vatsavai R.R.,

2001, Map cube: a visualization tool for spatial data

warehouses. In Miller H., Han J. (eds.) Geographic

Data Mining and Knowledge Discovery. Taylor &

Francis.

Kimbal R, 1996, The Data Warehouse Toolkit. John Wiley

& Sons.

J. Gray. S. Chaudhuri, A Bosworth, 1997, Data cube: A

relational aggregation operator generalizing groupby

crosstab and subtotals Data Mining and Knowledge

Discovery, 29-54.

TOWARDS DATA WAREHOUSES FOR NATURAL HAZARDS

419