posal. A semantic and performance evaluations are
detailed in Sec 5.1 and 5.2 respectively.
The refinement tool implements our algorithm us-
ing Matlab. It allows defining graph using a simple
visual interface as shown on Figure 7. The consid-
ered multidimensional graph is presented on the top
part of the visual interface. On the bottom one, the
algorithm ask inputs to users in a command window.
5.1 Semantic Evaluation
In this section, we describe the added-value of our
methodology from a design point of view (i.e. does
the refinement methodology corresponds to decision-
makers needs?). For that goal two we have investi-
gated two aspects: 1) Do dimensions and facts cre-
ated using our methodology correspond to decision-
makers analysis needs?; 2) Do hierarchies created us-
ing our methodology improve analysis capabilities?
Therefore have decided to compare the result of
our methodology with with one proposed in (Miquel
et al., 2002a). Indeed, (Miquel et al., 2002a) propose
a manually method to obtain a multi-version multi-
dimensional schema,and when the time dimension is
chosen as the context dimension our approach results
a multi-version multidimensional schema. The re-
sult of this validation shows that the multidimensional
schema produced with the manual methodology and
our automatic methodology are equal.
Moreover, in order to validate the semantic cor-
rectness of using AHC for hierarchies definition, we
have asked to ecologists of the project to choice be-
tween a spatial dimension with only one level, and
a spatial dimension with a hierarchy created using
AHC. When the number of created levels is not supe-
rior to 5, decision-makers prefer having hierarchies,
since they can reveal interesting pattern such as agri-
cultural profiles of census points. For example, data
in the “Environments” fact table contains data that de-
scribe agriculture policies around each census point
at each year. The data clustering according to these
data can classify census points and allows decision-
makers analyzing impact of agricultural practices on
bird biodiversity. For example, decision-makers can
analyze biodiversity according to agricultural forest
and grassland parameters of census points, by using
this simple OLAP query: “What is the biodiversity
value per group of census points (first level of the hi-
erarchy obtained with clustering) in 2002 and 2003?”.
This query can reveal that for the same year, for exam-
ple 2002, biodiversity is very affected by agricultural
parameters since the aggregated biodiversity value for
each group of census point is different.
5.2 Performance Evaluation
In this section, we test time performance of our
methodology in order to validate its feasibility from
a project deployment process point of view.
In particular we study time performance related
to: 1) refinement algorithm for facts and dimension
design, and 2) hierarchy creation using AHC.
In order to test the first point, we have created a
set of 200 simulated constellation schema using from
2 to 100 dimensions, since real usable multidimen-
sional schema presents maximum between 3 and 10
dimensions (Kimball, 1996). Finally, the worst time
execution is 15.23 s. The average execution time is
equal to 11.7 s with a standard deviation equal to 1.17
s. These performances are satisfactory for are good
for an off-line design phase.
In this paragraph, we study time performances of
the AHC algorithm. In this paragraph, “classified
items” are census points (which are members of the
“census points” dimension, the target dimension) and
“attributes” are aggregated facts from the “Environ-
ments” fact node (which is the source fact node). The
AHC algorithm has been also implemented in Mat-
lab and its performance has been also tested. Us-
ing our case study data, we perform 2090 tests, with
a number of classified items (source node instances-
Enverinments facts) between 10 and 190, and a num-
ber of attributes (source node attributes-Enverinments
fact measures) between 10 and 100, and the average
calculation time is equal to 0.072 s, with a standard
deviation equal to 0.002 s. To complete our evalu-
ation, we simulate a data set with 10,000 classified
items and 150 attributes. In this case, the AHC calcu-
lates a hierarchy in 147.36 s, with a standard deviation
equal to 4.03 , with a maximal calculation time equal
to 214 s. All time performances are shown on Figure
6. This calculation time (approximately four minutes)
is efficient for an off-line design phase.
Figure 6: Execution times according the number of at-
tributes and classified items.
MixedDrivenRefinementDesignofMultidimensionalModelsbasedonAgglomerativeHierarchicalClustering
553