2 RELATED WORK
In the context of data warehouses evolution, two cat-
egories of research emerged: the first one recom-
mends extending the multidimensional algebra with
a set of schema evolution operators while the second
proposes temporal multidimensional data models.
Schema Evolution Operators. Hurtado et al.
proposed a formal model of dimension updates in a
multidimensional model, covering updates to the do-
mains of the dimensions and structural updates to the
dimension hierarchies with a collection of primitive
operators to perform these updates (Hurtado et al., ;
Hurtado et al., 1999). For example, they propose the
generalize operator which creates a new level l
new
, to
which a pre-existent one, l
n
, rolls up. Blaschka et al.
improves works of Hurtado et al. by proposing a set
of operators independent of every logical and physical
model of the data warehouse (Blaschka et al., 1999).
Temporal Multidimensional Models. In tempo-
ral multidimensional database, the idea is to keep evo-
lution history by using timestamps. Thus, Vaisman et
al. proposed the TOLAP (Temporal OLAP) (Vaisman
and Mendelzon, 2000). In TOLAP, a dimension is de-
signed with a DAG (directed acyclic graph) where a
node represents a level and an edge represents a re-
lation between two adjacent levels. In the TOLAP
graph, edges are stamped with a time interval repre-
senting the validity period of the aggregation link. A
similar approach is proposed by Bliujute et al. with
the “Temporal Star Schema” (Bliujute et al., 1998).
Morzy et al. proposed a multiversion data warehouse
(Morzy and Wrembel, ; Morzy and Wrembel, 2004).
With this versioning approach, a new version of the
data warehouse is physically created when changes
occur. These timestamps are then used to identify
the good versions which will satisfy each analysis re-
quest.
3 K-MEANS BASED APPROACH
FOR DIMENSION UPDATES
3.1 K-means
K-means is known as a partitional clustering method
that allows to classify a given data set X through
k clusters fixed a priori (Forgy, 1965; Bradley and
Fayyad, 1998; Likas et al., 2003). The main idea is
to define k centroids, one for each cluster, and then
assign each point to one of the k clusters so as to min-
imize a measure of dispersion within the clusters.
Among existing clustering methods, we chose k-
means for its low and linear algorithmic complexity
and for its result format (a partition). Indeed, we think
that these two characteristics are important for OLAP
analysis and dimension updates in data warehouses.
3.2 Illustrative Example
Let us consider a sales data warehouse (figure 1).
This data warehouse contains two measures: sales
income and sold quantity. These measures can be
studied on three dimensions: “Time”, “Product” and
“Region”. The hierarchy of the Region dimension has
three levels: store, city and country. In the same way,
the Product dimension consists of three levels: prod-
uct, product category and product family. In addi-
tion, Time dimension is organized following four lev-
els: week, month, quarter and year.
PK_MONTH
WEEK CITY COUNTRY
WEEK SALESINCOME STORE AREA
MONTH
SOLDQUANTITY
PRODUCT
CATEGORY
PRICE
CATEGORY
Figure 1: Schema of the sales data warehouse.
3.3 Principle of Our Approach
In our approach, we are distinguished from the ex-
isting ones by the use of data mining techniques to
perform data warehouse schema evolution. Gener-
ally, to carry out OLAP analyses, the user generates
a data cube by selecting dimension level(s) and mea-
sure(s) which will satisfy its needs. Then, the user
explores the obtained cube to detect similarities in
facts and dimension instances. For that, he exploits
the different levels within a dimension. To help him
in this step, we propose a schema evolution operator
RollupWithKmeans allowing to create a new hierar-
chy level by using a clustering algorithm. Our idea
is to add a new level, l
new
, to which a pre-existent
one, l
n
, rolls up. To achieve our objective, our oper-
ator classifies initially the instances of the level l
n
by
using the k-means clustering algorithm. The opera-
tor RollupWithKmeans creates then the new level l
new
composed of the k instances corresponding to the k
obtained clusters. Finally, RollupWithKmeans defines
a rollup function between level l
n
and level l
new
by
relating the instances of the levels l
n
and l
new
accord-
ing to the k-means clustering result. The originality
of our schema evolution approach is that our rollup
ICEIS 2008 - International Conference on Enterprise Information Systems
532