SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES
Christian E. Kaas Torben Bach Pedersen Bjørn D. Rasmussen
Aalborg University
Fr. Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark
Keywords:
Star schemas, snowflake schemas, schema evolution
Abstract:
The most common implementation platform for multidimensional data warehouses is RDBMSs storing data in
relational star and snowflake schemas. DW schemas evolve over time, which may invalidate existing analysis
queries used for reporting purposes. However, the evolution properties of star and snowflake schemas have
not previously been investigated systematically. This paper systematically investigates the evolution properties
of star and snowflake schemas. Eight evolution operations are considered, covering insertion and deletion of
dimensions, levels, dimension attributes, and measure attributes. For each operation, the formal semantics of
the changes for star and snowflake schemas are given, and instance adaption and impact on existing queries
are described. Finally, we compare the evolution properties of star and snowflake schemas, concluding that
the star schema is considerably more robust towards schema changes than the snowflake schema.
1 INTRODUCTION
Data warehouses (DWs) based on a multidimen-
sional (MD) data model are used in almost all busi-
nesses (Kimball, 1996; Pedersen et al., 2001b). Mul-
tidimensional models view data as multidimensional
cubes where data is captured as facts with associated
numeric measures. The facts are characterized by a
number of dimensions organized in a hierarchy of lev-
els. DW queries fall in two categories, browse queries
that inspect the dimension hierarchies, and aggregate
queries that aggregate measure values up to a given
combination of dimension levels (Kimball, 1996).
Multidimensional DWs can be implemented us-
ing specialized multidimensional OLAP (MOLAP)
DBMSs (Pedersen et al., 2001b), but the most com-
mon implementation platform is relational OLAP
(ROLAP), i.e., RDBMSs storing data in relational
star and snowflake schemas (Kimball, 1996). Star
schemas have one denormalized table per dimension,
while snowflake schemas normalize the dimension ta-
bles with one table per dimension level. DW schemas
almost always evolve over time (Kimball, 1996; Kim-
ball et al., 1998), which may invalidate existing SQL
analysis queries. ROLAP data warehouses are usu-
ally queried by two different sets of tools. On-Line
Analytical Processing (OLAP) tools are used for on-
line, ad-hoc analyses by knowledge workers. Here,
the SQL queries are generated dynamically based on
a conceptual multidimensional schema stored in the
tool. Thus, schema changes in the DW can be handled
“easily” in the sense that queries will return the same
results. However, materialized aggregates will have
to be partly recomputed before the performance is the
same. In contrast, relational reporting tools are used
for off-line, periodic reporting and analysis. Here, the
SQL queries are typically fixed and embedded in re-
porting programs. Thus, schema evolution in the DW
means that the programs must be modified manually,
a very expensive, cumbersome, and error-prone pro-
cess. The main motivation for this paper is thus to
provide details on when and how DW schema evolu-
tion affects existing SQL queries in reporting tools.
A survey of data warehouse projects and research
issues (Vassiliadis, 2000) confirms that DW schema
evolution is important. However, to our knowl-
edge, the evolution properties of star and snowflake
schemas and the impact on existing DW queries have
not previously been systematically investigated.
This paper remedies the situation by systemati-
cally investigating the evolution properties of star and
snowflake schemas, and, by implication, of fact con-
stellation schemas. We start by formally defining star
and snowflake schemas, and browse and aggregate
425
E. Kaas C., Bach Pedersen T. and D. Rasmussen B. (2004).
SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES.
In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 425-433
DOI: 10.5220/0002619704250433
Copyright
c
SciTePress
queries over them. We then consider eight evolu-
tion operations, covering insertion and deletion of di-
mensions, levels, dimension attributes, and measure
attributes. For each operation, the formal seman-
tics of the changes for star and snowflake schemas
are given, and instance adaption and impact on ex-
isting browse and aggregate queries are described.
As previous work has already dealt with evolution in
the data instances, we do not cover this aspect. Fi-
nally, we compare the evolution properties of star and
snowflake schemas, looking both at which changes
cause existing browse or aggregate queries to fail, as
well as the ease of implementation and instance adap-
tion. In all cases, the star schema is either superior
to the snowflake schema or has the same problems.
We thus conclude that the star schema is consider-
ably more robust towards schema changes than the
snowflake schema.
A lot of previous work on general schema evolu-
tion has been performed (Roddick, 1992). However,
this work does not take the special characteristics of
multidimensional schemas and queries, e.g., dimen-
sions with hierarchies, into account. Several mul-
tidimensional models based on, e.g., UML (Lujan-
Mora et al., 2002) and E/R (Sapia et al., 1998) have
been proposed for conceptual and logical DW design
Some models (Pedersen et al., 2001a; Body et al.,
2002; Eder et al., 2001; Letz et al., 2002; Morzy
et al., 2003) also consider temporal aspects of the
data instances such as multiple version of dimensions
and facts, but not schema evolution in the DW. How-
ever, none of these models consider schema evolu-
tion. The paper (Levene et al., 2003) investigates
other properties of star and snowflake schemas such
as redundancy, but does not consider schema evo-
lution. Several papers (Bouzheghoub et al., 2000;
Vassiliadis et al, 2000) consider DW schema evolu-
tion at the coarse-grained level of materialized views,
either focusing on DW design (Bouzheghoub et al.,
2000) or DW quality (Vassiliadis et al, 2000), but do
not consider the special characteristics of multidimen-
sional schemas and queries. Another paper (Mendel-
zohn et al., 2000) considers querying across schema
evolution, but in a “pure” multidimensional model,
and thus do not investigate evolution in star and
snowflake schemas. The same is true for the FI-
ESTA framework (Blaschka et al., 1999; Blaschka,
2000). Another paper (Hurtado et al., 1999) considers
both schema evolution in a “pure” multidimensional
model, and the mapping to relational schemas, but
considers mainly instance rather than schema updates
(which are only treated briefly), and considers only
dimension updates. In comparison, the present paper
provides a detailed treatment of schema evolution, in-
cluding a formal semantics of the changes, provides
details on the impact on existing DW queries, and
considers both dimensions and fact/measures. Kim-
ball (Kimball, 1996; Kimball et al., 1998) infor-
mally mentions that schema changes should be added
“gracefully” to allow existing queries to work, but
neither investigates this systematically and formally,
nor compares the evolution properties of star and
snowflake schemas.
We believe this paper to be the first to systemati-
cally investigate the evolution properties of star and
snowflake schemas, including giving a formal seman-
tics for evolution operations on these schemas, and
investigating the impact on existing DW queries.
The rest of the paper is structured as follows. Sec-
tion 2 describes the multidimensional schemas and
queries considered in the paper. Section 3 examines
effects of the eight evolution operations, while Sec-
tion 4 summarizes the evolution properties for star
and snowflake schemas. Finally, Section 5 concludes
and points to future work.
2 MULTIDIMENSIONAL
SCHEMAS
We now formally define star and snowflake schemas.
In the following, we refer to the intuitive “MD-
schema” that corresponds to our conceptual under-
standing of the cube that the star or snowflake schema
implements, i.e., a schema with notions like cube,
measures, dimensions, levels, and attributes.
Definition 1 (Star schema) A star schema is a 6-
tuple: <D, fact, A, attrib, PKD, PKF> where D is
a set of dimension tables, fact is a fact table, A is a
set of attributes each with a name and a type, attrib:
A D {fact} is a function mapping an attribute
to either a fact table or a dimension table, PKD A
is the set of dimension table primary keys where the
primary key of the dimension table d
i
D is denoted
PKd
i
: PKd
i
PKD, and PKF A is the (compos-
ite) primary key of the fact table. Each K PKF is a
foreign key to exactly one dimension table d
i
D. ¤
The star schema uses denormalized dimensions
which means that the dimension levels in the MD
schema do not appear in the star schema. Instead a
dimension table d
i
in the star schema definition con-
sists of one of the base dimension levels defined in the
MD schema and all the dimension levels above it in
the MD schema. The attributes a
j
, j=1..n of d
i
cor-
responds to the attributes of the base dimension level
and all its preceding dimension levels just described.
Definition 2 (Snowflake schema) A snowflake
schema is an 8-tuple: <L, B, fact, A, attrib, PKL,
FKL, PKF>, where L is a set of dimension level
tables, B L is a set of base dimension level tables,
fact is a fact table, A is a set of attributes, attrib: A
L {fact} is a function mapping an attribute to
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
426
either a fact table or a dimension level table, PKL
A is a set of primary keys where the primary key
of the dimension level table l
i
L is denoted PKl
i
:
PKl
i
PKL, FKL A is a set of foreign keys where
the set of foreign keys for the dimension level table
l
i
L is denoted FKl
i
: FKl
i
FKL A, PKF
A is the (composite) primary key of the fact table.
Each K PKF is a foreign key to exactly one base
level dimension table d
i
B. The snowflake schema
can be viewed as a directed acyclic graph where the
tables are the nodes and the foreign key dependencies
represent the edges. The root of the graph will be the
fact table and the root’s directly connected nodes are
the base dimension level tables. The edges emanating
from the root connects to the base dimension level
tables and each base dimension level table is the root
of a subgraph. No two subgraphs can be connected.¤
The base dimension levels B of the snowflake
schema are the dimension levels that are associated
with facts in the MD-schema. However, this rela-
tion is represented as a foreign key dependency in the
snowflake schema. The dimension level hierarchy of
the MD-schema is also represented as foreign key de-
pendencies where the table representing an underly-
ing dimension level references the primary key of its
upper dimension levels.
DW Queries Generally, only two query types
are posed to data warehouses, namely 80% browse
and 20% aggregate queries (Kimball, 1996). Browse
queries only concern a single dimension and are used
to investigate the dimension hierarchy. Aggregate
queries aggregate the information in the fact table
to a level specified using one or more dimensions.
Prototypical star (top) and snowflake (bottom) browse
queries are shown below.
SELECT DISTINCT(d
i
.a
m
) FROM d
i
WHERE P ;
SELECT DISTINCT(l
j
.a
m
) FROM l
i
, .., l
j
WHERE P ;
The star schema browse query returns all distinct
values of the attribute a
m
in the dimension table d
i
for rows that match the predicate P . The snowflake
schema browse query returns all distinct values of
attribute a
m
in dimension level table l
j
for rows that
match the predicate P . The FROM-clause of the
browse query includes several dimension level tables.
In addition the predicate P also describes how the
dimension level tables l
i
, .., l
j
shall be joined. Proto-
typical star (top) and snowflake (bottom) aggregate
queries are shown below.
SELECT d
i
.a
i1
, .., d
i
.a
ik
, .., d
j
.a
j1
, .., d
j
.a
jl
AGG(fact.m
u
), .., AGG(f act.m
v
)
FROM d
i
, .., d
j
, fact WHERE P
GROUP BY d
i
.a
i1
, .., d
i
.a
ik
, .., d
j
.a
j1
, .., d
j
.a
jl
;
SELECT l
i
.a
i1
, .., l
i
.a
ik
, .., l
j
.a
j1
, .., l
j
.a
jl
,
AGG(fact.m
u
), .., AGG(f act.m
v
)
FROM l
i
, .., l
j
, fact WHERE P
GROUP BY l
i
.a
i1
, .., l
i
.a
ik
, .., l
j
.a
j1
, .., l
j
.a
jl
;
The select list of the star schema aggregate query
includes the dimension attributes a
1
, .., a
k
of dimen-
sion table d
i
and a
1
, .., a
l
of dimension table d
j
along
with measures m
u
, .., m
v
from the fact table fact.
AGG can be one of the aggregate functions MIN,
MAX, AVG, COUNT and SUM. Constraints that ap-
ply for rows being included is defined in the predi-
cate P along with rules for joining dimension tables
with the fact table. The snowflake query is similar,
but includes several tables per dimension. Aggregate
queries can use the aggregate functions MIN, MAX,
COUNT, AVG and SUM. Of these, MIN, MAX,
COUNT and AVG require the granularity of the fact
table (Kimball, 1996) to remain constant in order to
return correct results. This is not the case for SUM
(SUM is split-resistant, see Section 3.7).
3 EVOLUTION OPERATIONS
We now go through each of the eight evolution op-
erations and show their impact on star and snowflake
schemas, and existing DW queries. We give a for-
mal semantics for the change on star and snowflake
schemas, in terms of the input (In), output (Out), pre-
conditions (PCs), and schema changes (SCs). For star
schema changes, we give as input a star schema: St
= <D, fact, A, attrib, PKD, PKF> and for snowflake
schemas the input: Sn = <L, B, fact, A, attrib, PKL,
FKL, PKF>.
3.1 Insert attribute into level
Star changes Inserting an attribute a
new
into a di-
mension level in an MD-schema corresponds to in-
serting an attribute a
new
into a dimension table d
i
in
a star schema St = <D, fact, A, attrib, PKD, PKF>.
Adaption: All rows in the dimension table being al-
tered in the star schema must be updated such that all
rows will either be assigned a default value for the
new attribute or they will have to be updated sepa-
rately. Impact: Inserting a new attribute into a di-
mension table cannot cause existing queries browse
and aggregate queries to fail since no queries use this
new attribute yet.
Semantics
In St, a
new
, d
i
PCs a
new
/ A, d
i
D
SCs St
0
= <D, fact, A
0
, attrib
0
,PKD,PKF>
A
0
= A {a
new
}
attrib
0
(a
new
) = d
i
Out St
0
SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES
427
Snowflake changes Inserting an attribute a
new
into
a dimension level in an MD-schema corresponds to
inserting an attribute a
new
into a dimension level ta-
ble l
i
in a snowflake schema Sn = <L, B, fact, A,
attrib, PKL, FKL, PKF>. Adaption: Much in the
same way as the star schema the values for the new
attribute is either set a default value or updated sep-
arately. If the table being altered is higher in the di-
mension level hierarchy (with coarser granularity) the
updating of the rows will be faster than for the star
schema since fewer rows needs to be updated. E.g.
there are fewer rows in the “Category” table than in
the Item table of the snowflake schema. Impact: In-
serting a new attribute into a dimension level table
cannot affect existing browse and aggregate queries
using the schema since none of these includes the new
attribute.
Semantics
In Sn, a
new
, l
i
PCs a
new
/ A, l
i
L
SCs Sn
0
= <L, B, fact, A
0
, attrib
0
,PKL,FKL,PKF>
A
0
= A {a
new
}
attrib
0
(a
new
) = l
i
Out St
0
3.2 Delete attribute from level
Star changes Deleting an attribute a
del
from a dimen-
sion level in an MD-schema corresponds to deleting
an attribute a
del
from a dimension table d
i
in a star
schema St = <D, fact, A, attrib, PKD, PKF>. Be-
fore deleting the attribute the precondition has to be
strengthened such that it is not the last attribute (be-
sides the primary key) that is being deleted. Adap-
tion: There is no need to update any of the rows when
removing an attribute from a dimension table. Im-
pact: Existing browse and aggregate queries which
include this attribute will fail after performing this
evolution operation on the star schema.
Semantics
In St, a
del
, d
i
PCs a
del
A, attrib(a
del
) = d
i
D
Q = attrib
1
(d
i
) |Q| > 2
SCs St
0
= <D, fact, A
0
, attrib
0
,PKD,PKF>
attrib
0
(a
del
) =
A
0
= A \ {a
del
}
Out St
0
Snowflake changes Deleting an attribute a
del
from
a dimension level in an MD-schema corresponds to
deleting an attribute a
del
from a dimension level table
l
i
in a snowflake schema Sn = <L, B, fact, attrib,
PKL, FKL, PKF>. The precondition ensures that no
last non-key attribute of dimension level table l
i
can
be deleted. Adaption: Dropping an attribute from
a dimension level table does not require any rows to
be updated. Impact: The outcome of deleting an at-
tribute from a dimension level table will be that exist-
ing browse and aggregate queries which include this
attribute will fail.
Semantics
In Sn, a
del
, l
i
PCs a
del
A, attrib(a
del
) = l
i
L
Q = attrib
1
(l
i
) |Q| > |FKl
i
| + 2
SCs Sn
0
= <L, B, fact, A
0
, attrib
0
,PKL,FKL,PKF>
attrib
0
(a
del
) =
A
0
= A \ {a
del
}
Out Sn
0
3.3 Insert dimension level
There are three ways a new dimension level can be
added to an MD-schema. Either a) as a new base di-
mension level, b) as a new top dimension level (not
being directly connected to the fact) or c) between two
existing dimension level.
Star changes Inserting a dimension level with a set
of attributes attached in an MD-schema corresponds
to inserting a set of attributes Q in a dimension level
table d
i
in a star schema St = <D, fact, A, attrib,
PKD, PKF>. Additionally, in an MD-schema a clas-
sification is inserted but since the star schema is con-
strained in this matter no further than inserting a set
of attributes is done. Adaption: Due to the denor-
malized dimensions of the star schema it makes no
difference if the new dimension level is inserted as a
new top dimension level or between two dimension
levels. If, however, the new classification is inserted
as a new base dimension level and causes a change in
granularity then the fact table will have to be updated
along with the dimension table. This is only possible
if the facts can be “split” in a meaningful way, and
the measure data allocated at the new, finer granular-
ity. Impact: The new attributes which form the new
classification will not cause existing browse queries
to fail. Even if the new dimension level is inserted
as a new base dimension level in the MD-schema and
causes a change in granularity and thereby creating
more rows in the dimension table it still will not affect
browse queries because they include the DISTINCT
keyword when querying a dimension table. Aggre-
gate queries, on the other hand, can be affected by this
operation when a new base dimension level is inserted
causing change in granularity of the fact table. It is,
however, only aggregate queries using MIN, MAX,
COUNT and AVG that will fail while SUM will not
because it is split-resistant. When a new base dimen-
sion level is inserted the impact is similar to inserting
a new dimension into fact, see Section 3.7.
Semantics
In St, Q, d
i
PCs d
i
D, Q A = , Q 6=
SCs St
0
= <D, fact, A
0
, attrib
0
,PKD,PKF>
A
0
= AQ
q Q: attrib
0
(q) = d
i
Out St
0
Snowflake changes The semantics for inserting a
new dimension level table depends on where it is be-
ing inserted. Inserting a new dimension level table
l
newtop
on top of an existing top dimension level ta-
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
428
ble l
i
with a set of attributes Q in a snowflake schema
Sn = <L, B, fact, A, attrib, PKL, FKL, PKF> has
the following semantics.
Semantics
In Sn, l
newtop
, l
i
, Q
PCs l
newtop
6∈ L, l
i
L, Q A = ,
{PKl
newtop
} Q, Q 6=
SCs Sn
0
= <L
0
, B, fact, A
0
, attrib
0
,PKL
0
,FKL
00
,PKF>
L
0
= L {l
newtop
}
PKL
0
= PKL {PKl
newtop
}
FKL
0
= FKL \ FK
i
F K
i
0
i
= FKl
i
{PKl
newtop
}
FKL
00
= FKL
0
FKl
0
i
A
0
= A Q
q Q: attrib
0
(q) = l
newtop
Out Sn
0
Inserting a new dimension level table l
new
with a
set of attributes Q between existing dimension level
tables l
i
and the set of tables M where l
i
is classi-
fied according to the dimension level tables M in a
snowflake schema Sn = <L, B, fact, A, attrib, PKL,
FKL, PKF>:
Semantics
In Sn, l
new
, l
i
, M, Q
PCs l
new
6∈ L, M = {m
1
, .., m
n
}, {l
i
} M L
Q = {PKl
new
} {att
1
, .., att
n
}, Q A = , Q 6=
SCs Sn
0
= <L
0
, B, fact, A
0
, attrib
0
, PKL
0
, FKL
00
, PKF>
L
0
= L {l
new
}
PKL
0
= PKL {PKl
new
}
FKl
0
new
= FKl
new
FKl
i
FKL
0
= FKL \ FKl
i
FKl
0
i
= FKl
i
\ {PKm
1
, .., PKm
n
}
FKl
00
i
= FKl
0
i
{PKl
new
}
FKL
00
= FKL
0
FKl
new
FKl
00
i
A
0
= A Q
q Q: attrib
0
(q) = l
new
Out Sn
0
Inserting a new dimension level l
newbase
with a
set of attributes Q as new base dimension level re-
placing an existing base dimension level l
oldbase
in a
snowflake schema Sn = <L, B, fact, A, attrib, PKL,
FKL, PKF>:
Semantics
In Sn, l
newbase
, l
oldbase
, Q
PCs l
newbase
6∈ L, l
oldbase
L
Q = {PKl
newbase
} {att
1
, .., att
n
}, Q A = , Q 6=
SCs Sn
0
= <L
0
, B
0
, fact, A
0
, attrib
0
, PKL
0
, FKL
0
, PKF
00
>
L
0
= L {l
newbase
}
FKl
0
newbase
= FKl
newbase
{PKl
oldbase
}
FKL
0
= FKL FKl
0
newbase
PKF
0
= PKF \ {PKl
oldbase
}
PKF
00
= PKF
0
{PKl
newbase
}
A
0
= A Q
q Q: attrib
0
(q) = l
newbase
Out Sn
0
Adaption: If the new dimension level table be-
comes a new top dimension level table all the rows in
the former top dimension level table will need to have
their new foreign key updated such that every row is
classified according to its new top level. If instead
the new dimension level table is inserted between two
existing dimension level tables the lower table (with
finer granularity) will have to be classified according
to the new dimension level table and the new dimen-
sion level table will have to be classified according to
the upper table (with coarser granularity). Lastly, if
the new dimension level table becomes the new base
table the fact table will have to be updated such that
the existing fact references the new base table. Again,
this is only possible if the facts can be “split” in a
meaningful way. Impact: If the new dimension level
table becomes a new top level table existing browse
and aggregate queries will not fail since no existing
queries will include this table. If, however, the new
dimension level table is inserted between two dimen-
sion level tables both existing browse and aggregate
queries will fail if they require that the two existing
tables are able to join. A new base dimension level
table will not cause existing browse queries to fail but
all existing aggregate queries will fail because they try
to join the former base level table with the fact table.
3.4 Delete dimension level
Just as for “Insert dimension level and insert into clas-
sification” there are three ways of deleting an exist-
ing dimension level and classification from an MD-
schema. The dimension level being deleted along
with its classification relationships can either be a) a
top dimension level (not being directly connected to
the fact), b) between dimension levels, or c) a base
dimension level.
Star changes Deleting a dimension level l
MD
in an
MD-schema simply corresponds to deleting a set of
attributes Q in a dimension table d
i
in a star schema
St. The attributes in Q must correspond to the at-
tributes attached to l
MD
and l
MD
must not be a base
level with no additional dimension levels in the di-
mension, if this is the case, the whole dimension
should be deleted from the fact instead. Adaption: If
the dimension level being deleted in the MD-schema
is a top dimension level or is between two existing
dimension levels there is no need for updating any
instances in the dimension table because the denor-
malized dimension table in these cases has the same
granularity as the base dimension level from the MD-
schema. Therefore updating the instances of the di-
mension table containing the dimension level being
deleted will only be required when it is the base di-
mension level in the MD-schema that is the one being
deleted. When this is the case the fact table has to be
updated to adapt to the new granularity. This is done
by aggregating the fact table, see Section 3.7. Im-
pact: Browse queries that use deleted dimension table
attributes will of course no longer be able to execute.
The same goes for aggregate queries but in addition if
it was the base dimension level that was deleted from
the MD-schema and has caused a change in granu-
larity then these aggregate queries using MIN, MAX,
COUNT and AVG will return incorrect results while
SUM-queries continue to operate. Deleting the base
dimension level in an MD-schema is similar to delet-
ing a dimension level from a fact, see Section 3.8.
SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES
429
Semantics
In St, Q, d
i
PCs d
i
D, Q A, |Q| 6= 0
SCs St
0
= <D, fact, A
0
, attrib
0
, PKD, PKF>
q Q: attrib
0
(q) =
A
0
= A \ Q
Out St
0
Snowflake changes Deleting a top dimension level
l
del
replacing it with l
i
where l
i
is classified to l
del
in
a snowflake schema Sn:
Semantics
In Sn, l
del
, l
i
PCs {l
del
, l
i
} L
Q = attrib
1
(l
del
)
SCs Sn
0
= <L
0
, B, fact, A
0
, attrib
0
, PKL
0
, FKL
00
, PKF>
FKL
0
= FKL \ FKl
i
FKl
0
i
= FKl
i
\ {PKl
del
}
FKL
00
= FKL
0
FKl
i
0
q Q: attrib
0
(q) =
A
0
= A \ Q
L
0
= L \ {l
del
}
Out Sn
0
Deleting a dimension level table l
del
between exist-
ing level tables l
i
and a set of dimension level tables
M where l
i
is classified according to the set of dimen-
sion levels M in a snowflake schema Sn:
Semantics
In Sn, l
del
, l
i
, M
PCs M L, {l
del
, l
i
} L
Q = attrib
1
(l
del
)
SCs Sn
0
= <L
0
, B, fact, A
0
, attrib
0
, PKL
0
, FKL
00
, PKF>
FKL
0
= FKL \ FKl
i
FKl
i
0
= FKl
i
\ {PKl
del
}
FKl
i
0
0 = FKl
i
0
FKl
d
el
FKL
00
= FKL
0
FKl
i
00
q Q: attrib
0
(q) =
A
0
= A \ Q
L
0
= L \ {l
del
}
Out Sn
0
Deleting a base dimension level l
del
and replacing
it with a dimension level l
i
where l
i
is classified ac-
cording to l
del
in a snowflake schema Sn:
Semantics
In Sn, l
del
, l
i
PCs {l
del
, l
i
} L,
Q = attrib
1
(l
del
)
SCs Sn
0
= <L
0
, B, fact, A
0
, attrib
0
, PKL
0
, FKL
0
, PKF
0
>
PKF
0
= PKF \ PKl
del
PKF
00
= PKF
0
PKl
i
A
0
= A \ Q
q Q: attrib
0
(q) =
L
0
= L \ {l
del
}
Out Sn
0
Notice the constraint of the operation above requir-
ing a dimension level l
i
to exist to replace a deleted
dimension level l
del
. The constraint can also be ex-
pressed as |FKldel| = 0. The reason for this constraint
is due to the existence of the operation “Delete dimen-
sion from fact”.
Adaption If the top dimension level table is
dropped, no updates are necessary. Deleting a dimen-
sion level table between two other dimension level ta-
bles will require that the lower table (with finer gran-
ularity) is updated according to the deleted table’s up-
per dimension level table. Deleting the base dimen-
sion level table requires that the fact table is updated
such that the fact correspond to the new granularity set
by what becomes the new base dimension level table.
Impact: If the top dimension level table is deleted ex-
isting browse and aggregate queries that include this
table will fail. Deleting a dimension level table be-
tween two tables will make queries that are joining
over the deleted tables fail. Deleting a base dimen-
sion level table will invalidate browse and aggregate
queries that use the deleted table.
3.5 Insert measure attribute
This changes the MD-schema by connecting a new
measure to the fact.
Star changes For a star schema, the new attribute
a
new
must be added to the fact table in star schema
St. Adaption: The existing instances of the fact table
are either set a default value or updated from an out-
side source like an OLTP system. Impact: Adding
a new measure attribute will, of course, not affect
browse queries since browse queries only include a
single dimension table. Nor will aggregate queries be
affected since the new attribute is not included in any
existing queries.
Semantics
In St, a
new
PCs a
new
/ A
SCs St
0
= <D, fact, A
0
, attrib
0
, PKD, PKF>
A
0
= A {a
new
}
attrib
0
(a
new
) = fact
Out St
0
Snowflake changes Adding a new measure at-
tribute into the fact table of a snowflake schema is
quite the same semantics as for the star schema and
hence we choose not to show the operation semantics.
3.6 Delete measure attribute
Star changes Deleting a measure attribute from the
MD-schema will change the star schema St with a
measure attribute a
del
as seen below. Adaption:
When an attribute is deleted from the fact table, no
updating is required. When the attribute being deleted
is the last measure attribute the fact table becomes
a “factless fact table” (Kimball, 1996). Impact:
Browse queries only include a single dimension ta-
ble and are therefore not affected by this operation.
Aggregate queries which include the deleted attribute
will, however, fail after this operation has been per-
formed.
Semantics
In St, a
del
PCs a
del
A, attrib(a
del
) = fact
SCs St
0
= <D, fact, A
0
, attrib
0
, PKD, PKF>
attrib
0
(a
del
) =
A
0
= A \ {a
del
}
Out St
0
Snowflake changes Deleting a measure attribute
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
430
from the fact table of a snowflake schema is the same
operation as for star schemas.
3.7 Insert dimension into fact
Star changes The star schema St will create a new
dimension table d
new
with the set of attributes Q as
seen below. Adaption: In most cases adding a new
dimension to a fact will change the granularity of the
fact table and all rows in the fact table will have to be
deleted and reinserted. Impact: If the new dimension
changes the granularity of the fact table it can cause
existing aggregate queries to fail. More specifically,
when the granularity is made finer, one fact tuple is
spread across several tuples, which will cause exist-
ing aggregate queries using MIN, MAX, AVG and
COUNT to fail. Aggregate queries using SUM, how-
ever, will continue to return correct results.
Semantics
In St, d
new
, Q
PCs d
new
6∈D, Q A = , {PKd
new
}∈ Q
SCs St
0
= <D
0
, fact, A
0
, attrib
0
, PKD
0
, PKF
0
>
D
0
= D {d
new
}
PKF
0
= PKF {PKd
new
}
A
0
= A Q
q Q: attrib
0
(q) = {d
new
}
Out St
0
Snowflake changes The snowflake schema Sn is
extended with a new dimension consisting of only one
dimension level table l
new
and attributes Q, see be-
low. l
new
is therefore a base dimension level table.
Associating the fact table of a snowflake schema with
a new base dimension level table has the same conse-
quences for browse and aggregate queries as the star
schema.
Semantics
In Sn, l
new
, Q
PCs l
new
6∈ L, Q A = , Q 6= , PKl
new
Q
SCs Sn
0
= <L
0
, B
0
, fact, A
0
, attrib
0
, PKL, FKL, PKF
0
>
L
0
= L {l
new
}
PKF
0
= PKF {PKl
new
}
A
0
= A Q
q Q: attrib
0
(q) = l
new
Out Sn
0
3.8 Delete dimension from fact
Star changes The star schema does not have corre-
sponding tables for every dimension level in the MD-
schema, as in a snowflake schema. Thus, if the base
dimension level is deleted from the fact it means that
all the other dimension levels that are in the same di-
mension table in the star schema are also deleted. The
star schema St will remove the dimension table d
del
as seen below. Adaption: If deleting the dimension
causes a change in granularity, all existing rows in the
fact table will have to be recalculated by aggregat-
ing over all of the deleted dimension. Impact: Exist-
ing browse queries that include the deleted dimension
table will fail. Aggregate queries, in the same way
as for browse queries, will also fail if they include
the deleted dimension table. Additionally if delet-
ing the dimension table from the fact table causes a
change in granularity existing aggregate queries using
MIN, MAX, AVG and COUNT will fail, while SUM
queries not referring the deleted level will continue to
return correct results.
Semantics
In St, d
del
PCs d
del
D, Q = attrib
1
(d
del
), {PK
del
} Q
SCs St
0
= <D
0
, fact, A
0
, attrib
0
, PKD
0
, PKF
0
>
PKF
0
= PKF \ {PKd
del
}
q Q: attrib
0
(q) =
A
0
= A \ Q
D
0
= D \ {d
del
}
Out St
0
Snowflake changes When the base dimension
level table l
del
is removed in the snowflake schema
Sn, the fact table drops its foreign key reference to
l
del
. Deleting a base dimension level table affects ex-
isting browse and aggregate queries in the same way
as for the star schema.
Semantics
In Sn, l
del
PCs l
new
L, |FKl
del
| = 0,
Q = attrib
1
(l
del
, {PKl
del
} Q
SCs Sn
0
= <L
0
, B
0
, fact, A
0
, attrib
0
, PKL
0
, FKL, PKF
0
>
PKF
0
= PKF \ PKl
del
q Q: attrib
0
(q) =
A
0
= A \ Q
L
0
= L \ l
del
Out Sn
0
4 COMPARISON
In the following, the results of Section 3 are sum-
marized. For each evolution operation, we indicate
the “ease of implementation” and “ease of updating”
of the instances of the schemas. If the adaption task
can be accomplished with a few SQL statements, it is
“simple, otherwise it is “hard. We also summarize
which evolution operations can result in invalidating
browse and aggregate queries. The summarization is
seen in the tables below.
Ease of implementation Star Snowflake
schema schema
Insert attribute into level Simple Simple
Delete attribute from level Simple Simple
Insert dimension level Simple Hard
Delete dimension level Simple Hard
Insert measure attribute Simple Simple
Delete measure attribute Simple Simple
Insert dimension into fact Hard Hard
Delete dimension from fact Hard Hard
We see that in all cases the star schema is either
simpler or just as hard to update as its snowflake coun-
terpart. Especially the operations “Insert dimension
level” and “Delete dimension level” are far simpler
SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES
431
Ease of updating Star Snowflake
schema schema
Insert attribute into level Simple Simple
Delete attribute from level Simple Simple
Insert dimension level Simple Hard
Delete dimension level Simple Hard
Insert measure attribute into fact Simple Simple
Delete measure attribute from fact Simple Simple
Insert dimension into fact Hard Hard
Delete dimension from fact Hard Hard
Invalidates browse queries Star Snowflake
schema schema
Insert attribute into level No No
Delete attribute from level Yes Yes
Insert dimension level
- New top dimension level No No
- Between dimension levels No Yes
- New base dimension level No No
Delete dimension level
- Delete top dimension level Yes Yes
- Between dimension levels Yes Yes
- Delete base dimension level Yes Yes
Insert measure attribute No No
Delete measure attribute No No
Insert dimension into fact No No
Delete dimension from fact Yes Yes
Invalidates MIN, MAX, COUNT Star Snowflake
and AVG aggregate queries schema schema
Insert attribute into level No No
Delete attribute from level Yes Yes
Insert dimension level
- New top dimension level No No
- Between dimension levels No Yes
- New base dimension level Yes Yes
Delete dimension level
- Delete top dimension level Yes Yes
- Between dimension levels Yes Yes
- Delete base dimension level Yes Yes
Insert measure attribute No No
Delete measure attribute Yes Yes
Insert dimension into fact Yes Yes
Delete dimension from fact Yes Yes
to implement and update in the star schema due to
its denormalized dimensions. When all attributes of
a dimension are in the same table there are no pri-
Invalidates SUM aggregate Star Snowflake
queries schema schema
Insert attribute into level No No
Delete attribute from level Yes Yes
Insert dimension level
- New top dimension level No No
- Between dimension levels No Yes
- New base dimension level No Yes
Delete dimension level
- Delete top dimension level Yes Yes
- Between dimension levels Yes Yes
- Delete base dimension level Yes Yes
Insert measure attribute No No
Delete measure attribute Yes Yes
Insert dimension level into fact No No
Delete dimension level from fact Yes Yes
mary and foreign key dependencies that have to be up-
dated as in snowflake schemas. If we compare the two
schema types’ ability to let existing browse and aggre-
gate queries continue to operate, the star schema again
proves to be superior to the snowflake schema. Again
it is the two operations mentioned above that cause
the star schema to prevail. When performing the op-
eration “Insert dimension level” aggregate queries us-
ing MIN, MAX, AVG and COUNT will only fail on
the star schema if the new dimension level is inserted
as a new base dimension and also causing a change
in granularity. The snowflake schema, on the other
hand, will fail both for new base dimension levels and
new dimension levels between other dimension levels
because the tables above and below in the dimension
hierarchy (also including the fact table) will no longer
be possible to join (their join-path is broken). When
aggregate queries only use the SUM aggregate func-
tion, the star schema is much more robust than the
snowflake schema as it will not matter if the granu-
larity changes when inserting a new base dimension
level since SUM is split-resistant.
5 CONCLUSION AND FUTURE
WORK
Motivated by the popularity of star and snowflake
schemas for DW implementation, the lack of detailed
information on schema evolution for such schemas,
and the problems with schema evolution in DW re-
porting environments, this paper systematically inves-
tigated the evolution properties of star and snowflake
schemas. We formally defined star and snowflake
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
432
schemas, and considered eight evolution operations,
covering insertion and deletion of dimensions, levels,
dimension attributes, and measure attributes. For each
operation, the formal semantics of the changes for
star and snowflake schemas were given, and instance
adaption and impact on existing browse and aggregate
queries was described. Finally, the evolution prop-
erties of star and snowflake schemas was compared,
looking at the ease of implementation and instance
adaption and which changes cause existing browse
or aggregate queries to fail. In all cases, the star
schema was either superior to the snowflake schema
or had the same problems. Thus, the star schema
is considerably more robust towards schema changes
than the snowflake schema. This challenges the tradi-
tional relational assumption that normalized schemas
are more flexible. Interesting aspects of future work
include building a tool for supporting implementation
and instance adaption for schema evolution in rela-
tional DWs, as well as a framework/tool for rewriting
queries against old version of the schema to the new
schema, thus easing the problems of DW evolution.
REFERENCES
M. Blaschka, C. Sapia and G. H
¨
ofling. On Schema Evo-
lution in Multidimensional Databases. In Proc. of
DaWaK, pp. 153–164, 1999.
M. Blaschka. FIESTA, A Framework for Schema Evolution
in Multidimensional Databases. Ph. D. Thesis, TU
Mnchen, 2000.
M. Body, M. Miquel., Y. Bdard., and A. Tchounikine.
A Multidimensional and Multiversion Structure for
OLAP Applications. In Proc. of DOLAP, 2002.
M. Bouzheghoub and Z. Kedad. A Logical Model for Data
Warehouse Design and Evolution. In Proc. of DaWaK,
pp. 178–188, 2000
J. Eder and C. Koncilia. Changes of Dimension Data in
Temporal Data Warehouses. In Proc. of DaWaK,
2001.
C. A. Hurtado, A. O. Mendelzon, and A. A. Vaisman. Up-
dating OLAP Dimensions. In Proc. of DOLAP, 1999.
R. Kimball. The Data Warehouse Toolkit. Wiley, 1996.
R. Kimball et al. The Data Warehouse Lifecycle Toolkit.
Wiley, 1998.
C. Letz, E. Henn, and G. Vossen. Consistency in Data Ware-
house Dimensions. In Proc. of IDEAS, 2002.
M. Levene and G. Loizou. Why is the Snowflake Schema a
Good Data Warehouse Design?. Information Systems,
28(3):225–240, 2003.
S. Lujan-Mora, J. Trujillo, and I-Y. Song. Multidimensional
Modeling With UML Package Diagrams. In Proc. of
ER, pp. 199-213, 2002.
A. Mendelzohn and A. A. Vaisman. Temporal Queries in
OLAP. In Proc. of VLDB, pp. 242–253, 2000
T. Morzy and R. Wrembel. Modeling a Multiversion Data
Warehouse: a Formal Approach. In Proc. of ICEIS,
2003.
T. B. Pedersen, C. S. Jensen, and C. E. Dyreson. A Foun-
dation for Capturing and Querying Complex Multidi-
mensional Data. Information Systems 26(5):383–423,
2001.
T. B. Pedersen and C. S. Jensen. Multidimensional
Database Technology. IEEE Computer34(12):40–46,
2001.
J. Roddick. Schema Evolution in Database Systems - An
Annotated Bibliography. SIGMOD Record 21(4):35–
40, 1992.
C. Sapia, M. Blaschka, G. H
¨
ofling and B. Dinter. Extending
the E/R model for the Multidimensional Paradigm. In
Proc. of ER Workshops, pp. 105–116, 1998.
P. Vassiliadis. Gulliver in the land of data warehousing:
practical experiences and observations of a researcher.
In Proc. of DMDW, pp. 12-1–12-16, 2000.
P. Vassiliadis, M. Bouzheghoub, and C. Quix. Towards
Quality-Oriented Data Warehouse Usage and Evolu-
tion. Information Systems 25(2):89–115, 2000.
SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES
433