SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES

Christian E. Kaas Torben Bach Pedersen Bjørn D. Rasmussen

Aalborg University

Fr. Bajers Vej 7E, DK-9220 Aalborg Ø, Denmark

Keywords:

Star schemas, snowﬂake schemas, schema evolution

Abstract:

The most common implementation platform for multidimensional data warehouses is RDBMSs storing data in

relational star and snowﬂake schemas. DW schemas evolve over time, which may invalidate existing analysis

queries used for reporting purposes. However, the evolution properties of star and snowﬂake schemas have

not previously been investigated systematically. This paper systematically investigates the evolution properties

of star and snowﬂake schemas. Eight evolution operations are considered, covering insertion and deletion of

dimensions, levels, dimension attributes, and measure attributes. For each operation, the formal semantics of

the changes for star and snowﬂake schemas are given, and instance adaption and impact on existing queries

are described. Finally, we compare the evolution properties of star and snowﬂake schemas, concluding that

the star schema is considerably more robust towards schema changes than the snowﬂake schema.

1 INTRODUCTION

Data warehouses (DWs) based on a multidimen-

sional (MD) data model are used in almost all busi-

nesses (Kimball, 1996; Pedersen et al., 2001b). Mul-

tidimensional models view data as multidimensional

cubes where data is captured as facts with associated

numeric measures. The facts are characterized by a

number of dimensions organized in a hierarchy of lev-

els. DW queries fall in two categories, browse queries

that inspect the dimension hierarchies, and aggregate

queries that aggregate measure values up to a given

combination of dimension levels (Kimball, 1996).

Multidimensional DWs can be implemented us-

ing specialized multidimensional OLAP (MOLAP)

DBMSs (Pedersen et al., 2001b), but the most com-

mon implementation platform is relational OLAP

(ROLAP), i.e., RDBMSs storing data in relational

star and snowﬂake schemas (Kimball, 1996). Star

schemas have one denormalized table per dimension,

while snowﬂake schemas normalize the dimension ta-

bles with one table per dimension level. DW schemas

almost always evolve over time (Kimball, 1996; Kim-

ball et al., 1998), which may invalidate existing SQL

analysis queries. ROLAP data warehouses are usu-

ally queried by two different sets of tools. On-Line

Analytical Processing (OLAP) tools are used for on-

line, ad-hoc analyses by knowledge workers. Here,

the SQL queries are generated dynamically based on

a conceptual multidimensional schema stored in the

tool. Thus, schema changes in the DW can be handled

“easily” in the sense that queries will return the same

results. However, materialized aggregates will have

to be partly recomputed before the performance is the

same. In contrast, relational reporting tools are used

for off-line, periodic reporting and analysis. Here, the

SQL queries are typically ﬁxed and embedded in re-

porting programs. Thus, schema evolution in the DW

means that the programs must be modiﬁed manually,

a very expensive, cumbersome, and error-prone pro-

cess. The main motivation for this paper is thus to

provide details on when and how DW schema evolu-

tion affects existing SQL queries in reporting tools.

A survey of data warehouse projects and research

issues (Vassiliadis, 2000) conﬁrms that DW schema

evolution is important. However, to our knowl-

edge, the evolution properties of star and snowﬂake

schemas and the impact on existing DW queries have

not previously been systematically investigated.

This paper remedies the situation by systemati-

cally investigating the evolution properties of star and

snowﬂake schemas, and, by implication, of fact con-

stellation schemas. We start by formally deﬁning star

and snowﬂake schemas, and browse and aggregate

425

E. Kaas C., Bach Pedersen T. and D. Rasmussen B. (2004).

SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES.

In Proceedings of the Sixth International Conference on Enterprise Information Systems, pages 425-433

DOI: 10.5220/0002619704250433

 SciTePress

queries over them. We then consider eight evolu-

tion operations, covering insertion and deletion of di-

mensions, levels, dimension attributes, and measure

attributes. For each operation, the formal seman-

tics of the changes for star and snowﬂake schemas

are given, and instance adaption and impact on ex-

isting browse and aggregate queries are described.

As previous work has already dealt with evolution in

the data instances, we do not cover this aspect. Fi-

nally, we compare the evolution properties of star and

snowﬂake schemas, looking both at which changes

cause existing browse or aggregate queries to fail, as

well as the ease of implementation and instance adap-

tion. In all cases, the star schema is either superior

to the snowﬂake schema or has the same problems.

We thus conclude that the star schema is consider-

ably more robust towards schema changes than the

snowﬂake schema.

A lot of previous work on general schema evolu-

tion has been performed (Roddick, 1992). However,

this work does not take the special characteristics of

multidimensional schemas and queries, e.g., dimen-

sions with hierarchies, into account. Several mul-

tidimensional models based on, e.g., UML (Lujan-

Mora et al., 2002) and E/R (Sapia et al., 1998) have

been proposed for conceptual and logical DW design

Some models (Pedersen et al., 2001a; Body et al.,

2002; Eder et al., 2001; Letz et al., 2002; Morzy

et al., 2003) also consider temporal aspects of the

data instances such as multiple version of dimensions

and facts, but not schema evolution in the DW. How-

ever, none of these models consider schema evolu-

tion. The paper (Levene et al., 2003) investigates

other properties of star and snowﬂake schemas such

as redundancy, but does not consider schema evo-

lution. Several papers (Bouzheghoub et al., 2000;

Vassiliadis et al, 2000) consider DW schema evolu-

tion at the coarse-grained level of materialized views,

either focusing on DW design (Bouzheghoub et al.,

2000) or DW quality (Vassiliadis et al, 2000), but do

not consider the special characteristics of multidimen-

sional schemas and queries. Another paper (Mendel-

zohn et al., 2000) considers querying across schema

evolution, but in a “pure” multidimensional model,

and thus do not investigate evolution in star and

snowﬂake schemas. The same is true for the FI-

ESTA framework (Blaschka et al., 1999; Blaschka,

2000). Another paper (Hurtado et al., 1999) considers

both schema evolution in a “pure” multidimensional

model, and the mapping to relational schemas, but

considers mainly instance rather than schema updates

(which are only treated brieﬂy), and considers only

dimension updates. In comparison, the present paper

provides a detailed treatment of schema evolution, in-

cluding a formal semantics of the changes, provides

details on the impact on existing DW queries, and

considers both dimensions and fact/measures. Kim-

ball (Kimball, 1996; Kimball et al., 1998) infor-

mally mentions that schema changes should be added

“gracefully” to allow existing queries to work, but

neither investigates this systematically and formally,

nor compares the evolution properties of star and

snowﬂake schemas.

We believe this paper to be the ﬁrst to systemati-

cally investigate the evolution properties of star and

snowﬂake schemas, including giving a formal seman-

tics for evolution operations on these schemas, and

investigating the impact on existing DW queries.

The rest of the paper is structured as follows. Sec-

tion 2 describes the multidimensional schemas and

queries considered in the paper. Section 3 examines

effects of the eight evolution operations, while Sec-

tion 4 summarizes the evolution properties for star

and snowﬂake schemas. Finally, Section 5 concludes

and points to future work.

2 MULTIDIMENSIONAL

SCHEMAS

We now formally deﬁne star and snowﬂake schemas.

In the following, we refer to the intuitive “MD-

schema” that corresponds to our conceptual under-

standing of the cube that the star or snowﬂake schema

implements, i.e., a schema with notions like cube,

measures, dimensions, levels, and attributes.

Deﬁnition 1 (Star schema) A star schema is a 6-

tuple: <D, fact, A, attrib, PKD, PKF> where D is

a set of dimension tables, fact is a fact table, A is a

set of attributes each with a name and a type, attrib:

A → D ∪ {fact} is a function mapping an attribute

to either a fact table or a dimension table, PKD ⊂ A

is the set of dimension table primary keys where the

primary key of the dimension table d

∈ D is denoted

PKd

: PKd

∈ PKD, and PKF ⊂ A is the (compos-

ite) primary key of the fact table. Each K ∈ PKF is a

foreign key to exactly one dimension table d

∈ D. ¤

The star schema uses denormalized dimensions

which means that the dimension levels in the MD

schema do not appear in the star schema. Instead a

dimension table d

in the star schema deﬁnition con-

sists of one of the base dimension levels deﬁned in the

MD schema and all the dimension levels above it in

the MD schema. The attributes a

, j=1..n of d

cor-

responds to the attributes of the base dimension level

and all its preceding dimension levels just described.

Deﬁnition 2 (Snowﬂake schema) A snowﬂake

schema is an 8-tuple: <L, B, fact, A, attrib, PKL,

FKL, PKF>, where L is a set of dimension level

tables, B ⊂ L is a set of base dimension level tables,

fact is a fact table, A is a set of attributes, attrib: A

→ L ∪ {fact} is a function mapping an attribute to

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

426

either a fact table or a dimension level table, PKL

⊂ A is a set of primary keys where the primary key

of the dimension level table l

∈ L is denoted PKl

PKl

∈ PKL, FKL ⊂ A is a set of foreign keys where

the set of foreign keys for the dimension level table

∈ L is denoted FKl

: FKl

⊂ FKL ⊂ A, PKF ⊂

A is the (composite) primary key of the fact table.

Each K ∈ PKF is a foreign key to exactly one base

level dimension table d

∈ B. The snowﬂake schema

can be viewed as a directed acyclic graph where the

tables are the nodes and the foreign key dependencies

represent the edges. The root of the graph will be the

fact table and the root’s directly connected nodes are

the base dimension level tables. The edges emanating

from the root connects to the base dimension level

tables and each base dimension level table is the root

of a subgraph. No two subgraphs can be connected.¤

The base dimension levels B of the snowﬂake

schema are the dimension levels that are associated

with facts in the MD-schema. However, this rela-

tion is represented as a foreign key dependency in the

snowﬂake schema. The dimension level hierarchy of

the MD-schema is also represented as foreign key de-

pendencies where the table representing an underly-

ing dimension level references the primary key of its

upper dimension levels.

DW Queries Generally, only two query types

are posed to data warehouses, namely 80% browse

and 20% aggregate queries (Kimball, 1996). Browse

queries only concern a single dimension and are used

to investigate the dimension hierarchy. Aggregate

queries aggregate the information in the fact table

to a level speciﬁed using one or more dimensions.

Prototypical star (top) and snowﬂake (bottom) browse

queries are shown below.

SELECT DISTINCT(d

) FROM d

WHERE P ;

SELECT DISTINCT(l

) FROM l

, .., l

WHERE P ;

The star schema browse query returns all distinct

values of the attribute a

in the dimension table d

for rows that match the predicate P . The snowﬂake

schema browse query returns all distinct values of

attribute a

in dimension level table l

for rows that

match the predicate P . The FROM-clause of the

browse query includes several dimension level tables.

In addition the predicate P also describes how the

dimension level tables l

, .., l

shall be joined. Proto-

typical star (top) and snowﬂake (bottom) aggregate

queries are shown below.

SELECT d

, .., d

AGG(fact.m

), .., AGG(f act.m

)

FROM d

, .., d

, fact WHERE P

GROUP BY d

, .., d

;

SELECT l

, .., l

AGG(fact.m

), .., AGG(f act.m

)

FROM l

, .., l

, fact WHERE P

GROUP BY l

, .., l

;

The select list of the star schema aggregate query

includes the dimension attributes a

, .., a

of dimen-

sion table d

and a

, .., a

of dimension table d

along

with measures m

, .., m

from the fact table fact.

AGG can be one of the aggregate functions MIN,

MAX, AVG, COUNT and SUM. Constraints that ap-

ply for rows being included is deﬁned in the predi-

cate P along with rules for joining dimension tables

with the fact table. The snowﬂake query is similar,

but includes several tables per dimension. Aggregate

queries can use the aggregate functions MIN, MAX,

COUNT, AVG and SUM. Of these, MIN, MAX,

COUNT and AVG require the granularity of the fact

table (Kimball, 1996) to remain constant in order to

return correct results. This is not the case for SUM

(SUM is split-resistant, see Section 3.7).

3 EVOLUTION OPERATIONS

We now go through each of the eight evolution op-

erations and show their impact on star and snowﬂake

schemas, and existing DW queries. We give a for-

mal semantics for the change on star and snowﬂake

schemas, in terms of the input (In), output (Out), pre-

conditions (PCs), and schema changes (SCs). For star

schema changes, we give as input a star schema: St

= <D, fact, A, attrib, PKD, PKF> and for snowﬂake

schemas the input: Sn = <L, B, fact, A, attrib, PKL,

FKL, PKF>.

3.1 Insert attribute into level

Star changes Inserting an attribute a

new

into a di-

mension level in an MD-schema corresponds to in-

serting an attribute a

new

into a dimension table d

a star schema St = <D, fact, A, attrib, PKD, PKF>.

Adaption: All rows in the dimension table being al-

tered in the star schema must be updated such that all

rows will either be assigned a default value for the

new attribute or they will have to be updated sepa-

rately. Impact: Inserting a new attribute into a di-

mension table cannot cause existing queries browse

and aggregate queries to fail since no queries use this

new attribute yet.

Semantics

In St, a

new

, d

PCs a

new

/∈ A, d

∈ D

SCs St

= <D, fact, A

, attrib

,PKD,PKF>

= A ∪ {a

new

}

attrib

new

) = d

Out St

SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES

427

Snowﬂake changes Inserting an attribute a

new

into

a dimension level in an MD-schema corresponds to

inserting an attribute a

new

into a dimension level ta-

ble l

in a snowﬂake schema Sn = <L, B, fact, A,

attrib, PKL, FKL, PKF>. Adaption: Much in the

same way as the star schema the values for the new

attribute is either set a default value or updated sep-

arately. If the table being altered is higher in the di-

mension level hierarchy (with coarser granularity) the

updating of the rows will be faster than for the star

schema since fewer rows needs to be updated. E.g.

there are fewer rows in the “Category” table than in

the Item table of the snowﬂake schema. Impact: In-

serting a new attribute into a dimension level table

cannot affect existing browse and aggregate queries

using the schema since none of these includes the new

attribute.

Semantics

In Sn, a

new

, l

PCs a

new

/∈ A, l

∈ L

SCs Sn

= <L, B, fact, A

, attrib

,PKL,FKL,PKF>

= A ∪ {a

new

}

attrib

new

) = l

Out St

3.2 Delete attribute from level

Star changes Deleting an attribute a

del

from a dimen-

sion level in an MD-schema corresponds to deleting

an attribute a

del

from a dimension table d

in a star

schema St = <D, fact, A, attrib, PKD, PKF>. Be-

fore deleting the attribute the precondition has to be

strengthened such that it is not the last attribute (be-

sides the primary key) that is being deleted. Adap-

tion: There is no need to update any of the rows when

removing an attribute from a dimension table. Im-

pact: Existing browse and aggregate queries which

include this attribute will fail after performing this

evolution operation on the star schema.

Semantics

In St, a

del

, d

PCs a

del

∈ A, attrib(a

del

) = d

∈ D

Q = attrib

−1

) ∧ |Q| > 2

SCs St

= <D, fact, A

, attrib

,PKD,PKF>

attrib

del

) = ∅

= A \ {a

del

}

Out St

Snowﬂake changes Deleting an attribute a

del

from

a dimension level in an MD-schema corresponds to

deleting an attribute a

del

from a dimension level table

in a snowﬂake schema Sn = <L, B, fact, attrib,

PKL, FKL, PKF>. The precondition ensures that no

last non-key attribute of dimension level table l

can

be deleted. Adaption: Dropping an attribute from

a dimension level table does not require any rows to

be updated. Impact: The outcome of deleting an at-

tribute from a dimension level table will be that exist-

ing browse and aggregate queries which include this

attribute will fail.

Semantics

In Sn, a

del

, l

PCs a

del

∈ A, attrib(a

del

) = l

∈ L

Q = attrib

−1

)∧ |Q| > |FKl

| + 2

SCs Sn

= <L, B, fact, A

, attrib

,PKL,FKL,PKF>

attrib

del

) = ∅

= A \ {a

del

}

Out Sn

3.3 Insert dimension level

There are three ways a new dimension level can be

added to an MD-schema. Either a) as a new base di-

mension level, b) as a new top dimension level (not

being directly connected to the fact) or c) between two

existing dimension level.

Star changes Inserting a dimension level with a set

of attributes attached in an MD-schema corresponds

to inserting a set of attributes Q in a dimension level

table d

in a star schema St = <D, fact, A, attrib,

PKD, PKF>. Additionally, in an MD-schema a clas-

siﬁcation is inserted but since the star schema is con-

strained in this matter no further than inserting a set

of attributes is done. Adaption: Due to the denor-

malized dimensions of the star schema it makes no

difference if the new dimension level is inserted as a

new top dimension level or between two dimension

levels. If, however, the new classiﬁcation is inserted

as a new base dimension level and causes a change in

granularity then the fact table will have to be updated

along with the dimension table. This is only possible

if the facts can be “split” in a meaningful way, and

the measure data allocated at the new, ﬁner granular-

ity. Impact: The new attributes which form the new

classiﬁcation will not cause existing browse queries

to fail. Even if the new dimension level is inserted

as a new base dimension level in the MD-schema and

causes a change in granularity and thereby creating

more rows in the dimension table it still will not affect

browse queries because they include the DISTINCT

keyword when querying a dimension table. Aggre-

gate queries, on the other hand, can be affected by this

operation when a new base dimension level is inserted

causing change in granularity of the fact table. It is,

however, only aggregate queries using MIN, MAX,

COUNT and AVG that will fail while SUM will not

because it is split-resistant. When a new base dimen-

sion level is inserted the impact is similar to inserting

a new dimension into fact, see Section 3.7.

Semantics

In St, Q, d

PCs d

∈ D, Q ∩ A = ∅, Q 6= ∅

SCs St

= <D, fact, A

, attrib

,PKD,PKF>

= A∪Q

∀q ∈ Q: attrib

(q) = d

Out St

Snowﬂake changes The semantics for inserting a

new dimension level table depends on where it is be-

ing inserted. Inserting a new dimension level table

newtop

on top of an existing top dimension level ta-

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

428

ble l

with a set of attributes Q in a snowﬂake schema

Sn = <L, B, fact, A, attrib, PKL, FKL, PKF> has

the following semantics.

Semantics

In Sn, l

newtop

, l

, Q

PCs l

newtop

6∈ L, l

∈ L, Q ∩ A = ∅,

{PKl

newtop

} ∈ Q, Q 6= ∅

SCs Sn

= <L

, B, fact, A

, attrib

,PKL

,FKL

,PKF>

= L ∪ {l

newtop

}

PKL

= PKL ∪ {PKl

newtop

}

FKL

= FKL \ FK

F K

= FKl

∪ {PKl

newtop

}

FKL

= FKL

∪ FKl

= A ∪ Q

∀q ∈ Q: attrib

(q) = l

newtop

Out Sn

Inserting a new dimension level table l

new

with a

set of attributes Q between existing dimension level

tables l

and the set of tables M where l

is classi-

ﬁed according to the dimension level tables M in a

snowﬂake schema Sn = <L, B, fact, A, attrib, PKL,

FKL, PKF>:

Semantics

In Sn, l

new

, l

, M, Q

PCs l

new

6∈ L, M = {m

, .., m

}, {l

} ∪ M ⊆ L

Q = {PKl

new

} ∪ {att

, .., att

}, Q ∩ A = ∅, Q 6= ∅

SCs Sn

= <L

, B, fact, A

, attrib

, PKL

, FKL

, PKF>

= L ∪ {l

new

}

PKL

= PKL ∪ {PKl

new

}

FKl

new

= FKl

new

∪ FKl

FKL

= FKL \ FKl

FKl

= FKl

\ {PKm

, .., PKm

}

FKl

= FKl

∪ {PKl

new

}

FKL

= FKL

∪ FKl

new

∪ FKl

= A ∪ Q

∀q ∈ Q: attrib

(q) = l

new

Out Sn

Inserting a new dimension level l

newbase

with a

set of attributes Q as new base dimension level re-

placing an existing base dimension level l

oldbase

in a

snowﬂake schema Sn = <L, B, fact, A, attrib, PKL,

FKL, PKF>:

Semantics

In Sn, l

newbase

, l

oldbase

, Q

PCs l

newbase

6∈ L, l

oldbase

∈ L

Q = {PKl

newbase

} ∪ {att

, .., att

}, Q ∩ A = ∅, Q 6= ∅

SCs Sn

= <L

, B

, fact, A

, attrib

, PKL

, FKL

, PKF

= L ∪ {l

newbase

}

FKl

newbase

= FKl

newbase

∪ {PKl

oldbase

}

FKL

= FKL ∪ FKl

newbase

PKF

= PKF \ {PKl

oldbase

}

PKF

= PKF

∪ {PKl

newbase

}

= A ∪ Q

∀q ∈ Q: attrib

(q) = l

newbase

Out Sn

Adaption: If the new dimension level table be-

comes a new top dimension level table all the rows in

the former top dimension level table will need to have

their new foreign key updated such that every row is

classiﬁed according to its new top level. If instead

the new dimension level table is inserted between two

existing dimension level tables the lower table (with

ﬁner granularity) will have to be classiﬁed according

to the new dimension level table and the new dimen-

sion level table will have to be classiﬁed according to

the upper table (with coarser granularity). Lastly, if

the new dimension level table becomes the new base

table the fact table will have to be updated such that

the existing fact references the new base table. Again,

this is only possible if the facts can be “split” in a

meaningful way. Impact: If the new dimension level

table becomes a new top level table existing browse

and aggregate queries will not fail since no existing

queries will include this table. If, however, the new

dimension level table is inserted between two dimen-

sion level tables both existing browse and aggregate

queries will fail if they require that the two existing

tables are able to join. A new base dimension level

table will not cause existing browse queries to fail but

all existing aggregate queries will fail because they try

to join the former base level table with the fact table.

3.4 Delete dimension level

Just as for “Insert dimension level and insert into clas-

siﬁcation” there are three ways of deleting an exist-

ing dimension level and classiﬁcation from an MD-

schema. The dimension level being deleted along

with its classiﬁcation relationships can either be a) a

top dimension level (not being directly connected to

the fact), b) between dimension levels, or c) a base

dimension level.

Star changes Deleting a dimension level l

in an

MD-schema simply corresponds to deleting a set of

attributes Q in a dimension table d

in a star schema

St. The attributes in Q must correspond to the at-

tributes attached to l

and l

must not be a base

level with no additional dimension levels in the di-

mension, if this is the case, the whole dimension

should be deleted from the fact instead. Adaption: If

the dimension level being deleted in the MD-schema

is a top dimension level or is between two existing

dimension levels there is no need for updating any

instances in the dimension table because the denor-

malized dimension table in these cases has the same

granularity as the base dimension level from the MD-

schema. Therefore updating the instances of the di-

mension table containing the dimension level being

deleted will only be required when it is the base di-

mension level in the MD-schema that is the one being

deleted. When this is the case the fact table has to be

updated to adapt to the new granularity. This is done

by aggregating the fact table, see Section 3.7. Im-

pact: Browse queries that use deleted dimension table

attributes will of course no longer be able to execute.

The same goes for aggregate queries but in addition if

it was the base dimension level that was deleted from

the MD-schema and has caused a change in granu-

larity then these aggregate queries using MIN, MAX,

COUNT and AVG will return incorrect results while

SUM-queries continue to operate. Deleting the base

dimension level in an MD-schema is similar to delet-

ing a dimension level from a fact, see Section 3.8.

SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES

429

Semantics

In St, Q, d

PCs d

∈ D, Q ⊂ A, |Q| 6= 0

SCs St

= <D, fact, A

, attrib

, PKD, PKF>

∀q ∈ Q: attrib

(q) = ∅

= A \ Q

Out St

Snowﬂake changes Deleting a top dimension level

del

replacing it with l

where l

is classiﬁed to l

del

a snowﬂake schema Sn:

Semantics

In Sn, l

del

, l

PCs {l

del

, l

} ∈ L

Q = attrib

−1

del

)

SCs Sn

= <L

, B, fact, A

, attrib

, PKL

, FKL

, PKF>

FKL

= FKL \ FKl

FKl

= FKl

\ {PKl

del

}

FKL

= FKL

∪ FKl

∀q ∈ Q: attrib

(q) = ∅

= A \ Q

= L \ {l

del

}

Out Sn

Deleting a dimension level table l

del

between exist-

ing level tables l

and a set of dimension level tables

M where l

is classiﬁed according to the set of dimen-

sion levels M in a snowﬂake schema Sn:

Semantics

In Sn, l

del

, l

, M

PCs M ⊂ L, {l

del

, l

} ∈ L

Q = attrib

−1

del

)

SCs Sn

= <L

, B, fact, A

, attrib

, PKL

, FKL

, PKF>

FKL

= FKL \ FKl

FKl

= FKl

\ {PKl

del

}

FKl

0 = FKl

∪ FKl

FKL

= FKL

∪ FKl

∀q ∈ Q: attrib

(q) = ∅

= A \ Q

= L \ {l

del

}

Out Sn

Deleting a base dimension level l

del

and replacing

it with a dimension level l

where l

is classiﬁed ac-

cording to l

del

in a snowﬂake schema Sn:

Semantics

In Sn, l

del

, l

PCs {l

del

, l

} ∈ L,

Q = attrib

−1

del

)

SCs Sn

= <L

, B, fact, A

, attrib

, PKL

, FKL

, PKF

PKF

= PKF \ PKl

del

PKF

= PKF

∪ PKl

= A \ Q

∀q ∈ Q: attrib

(q) = ∅

= L \ {l

del

}

Out Sn

Notice the constraint of the operation above requir-

ing a dimension level l

to exist to replace a deleted

dimension level l

del

. The constraint can also be ex-

pressed as |FKldel| = 0. The reason for this constraint

is due to the existence of the operation “Delete dimen-

sion from fact”.

Adaption If the top dimension level table is

dropped, no updates are necessary. Deleting a dimen-

sion level table between two other dimension level ta-

bles will require that the lower table (with ﬁner gran-

ularity) is updated according to the deleted table’s up-

per dimension level table. Deleting the base dimen-

sion level table requires that the fact table is updated

such that the fact correspond to the new granularity set

by what becomes the new base dimension level table.

Impact: If the top dimension level table is deleted ex-

isting browse and aggregate queries that include this

table will fail. Deleting a dimension level table be-

tween two tables will make queries that are joining

over the deleted tables fail. Deleting a base dimen-

sion level table will invalidate browse and aggregate

queries that use the deleted table.

3.5 Insert measure attribute

This changes the MD-schema by connecting a new

measure to the fact.

Star changes For a star schema, the new attribute

new

must be added to the fact table in star schema

St. Adaption: The existing instances of the fact table

are either set a default value or updated from an out-

side source like an OLTP system. Impact: Adding

a new measure attribute will, of course, not affect

browse queries since browse queries only include a

single dimension table. Nor will aggregate queries be

affected since the new attribute is not included in any

existing queries.

Semantics

In St, a

new

PCs a

new

/∈ A

SCs St

= <D, fact, A

, attrib

, PKD, PKF>

= A ∪ {a

new

}

attrib

new

) = fact

Out St

Snowﬂake changes Adding a new measure at-

tribute into the fact table of a snowﬂake schema is

quite the same semantics as for the star schema and

hence we choose not to show the operation semantics.

3.6 Delete measure attribute

Star changes Deleting a measure attribute from the

MD-schema will change the star schema St with a

measure attribute a

del

as seen below. Adaption:

When an attribute is deleted from the fact table, no

updating is required. When the attribute being deleted

is the last measure attribute the fact table becomes

a “factless fact table” (Kimball, 1996). Impact:

Browse queries only include a single dimension ta-

ble and are therefore not affected by this operation.

Aggregate queries which include the deleted attribute

will, however, fail after this operation has been per-

formed.

Semantics

In St, a

del

PCs a

del

∈ A, attrib(a

del

) = fact

SCs St

= <D, fact, A

, attrib

, PKD, PKF>

attrib

del

) = ∅

= A \ {a

del

}

Out St

Snowﬂake changes Deleting a measure attribute

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

430

from the fact table of a snowﬂake schema is the same

operation as for star schemas.

3.7 Insert dimension into fact

Star changes The star schema St will create a new

dimension table d

new

with the set of attributes Q as

seen below. Adaption: In most cases adding a new

dimension to a fact will change the granularity of the

fact table and all rows in the fact table will have to be

deleted and reinserted. Impact: If the new dimension

changes the granularity of the fact table it can cause

existing aggregate queries to fail. More speciﬁcally,

when the granularity is made ﬁner, one fact tuple is

spread across several tuples, which will cause exist-

ing aggregate queries using MIN, MAX, AVG and

COUNT to fail. Aggregate queries using SUM, how-

ever, will continue to return correct results.

Semantics

In St, d

new

, Q

PCs d

new

6∈D, Q ∩ A = ∅, {PKd

new

}∈ Q

SCs St

= <D

, fact, A

, attrib

, PKD

, PKF

= D ∪ {d

new

}

PKF

= PKF ∪ {PKd

new

}

= A ∪ Q

∀q ∈ Q: attrib

(q) = {d

new

}

Out St

Snowﬂake changes The snowﬂake schema Sn is

extended with a new dimension consisting of only one

dimension level table l

new

and attributes Q, see be-

low. l

new

is therefore a base dimension level table.

Associating the fact table of a snowﬂake schema with

a new base dimension level table has the same conse-

quences for browse and aggregate queries as the star

schema.

Semantics

In Sn, l

new

, Q

PCs l

new

6∈ L, Q ∩ A = ∅, Q 6= ∅, PKl

new

∈ Q

SCs Sn

= <L

, B

, fact, A

, attrib

, PKL, FKL, PKF

= L ∪ {l

new

}

PKF

= PKF ∪ {PKl

new

}

= A ∪ Q

∀q ∈ Q: attrib

(q) = l

new

Out Sn

3.8 Delete dimension from fact

Star changes The star schema does not have corre-

sponding tables for every dimension level in the MD-

schema, as in a snowﬂake schema. Thus, if the base

dimension level is deleted from the fact it means that

all the other dimension levels that are in the same di-

mension table in the star schema are also deleted. The

star schema St will remove the dimension table d

del

as seen below. Adaption: If deleting the dimension

causes a change in granularity, all existing rows in the

fact table will have to be recalculated by aggregat-

ing over all of the deleted dimension. Impact: Exist-

ing browse queries that include the deleted dimension

table will fail. Aggregate queries, in the same way

as for browse queries, will also fail if they include

the deleted dimension table. Additionally if delet-

ing the dimension table from the fact table causes a

change in granularity existing aggregate queries using

MIN, MAX, AVG and COUNT will fail, while SUM

queries not referring the deleted level will continue to

return correct results.

Semantics

In St, d

del

PCs d

del

∈ D, Q = attrib

−1

del

), {PK

del

} ∈ Q

SCs St

= <D

, fact, A

, attrib

, PKD

, PKF

PKF

= PKF \ {PKd

del

}

∀q ∈ Q: attrib

(q) = ∅

= A \ Q

= D \ {d

del

}

Out St

Snowﬂake changes When the base dimension

level table l

del

is removed in the snowﬂake schema

Sn, the fact table drops its foreign key reference to

del

. Deleting a base dimension level table affects ex-

isting browse and aggregate queries in the same way

as for the star schema.

Semantics

In Sn, l

del

PCs l

new

∈ L, |FKl

del

| = 0,

Q = attrib

−1

del

, {PKl

del

} ∈ Q

SCs Sn

= <L

, B

, fact, A

, attrib

, PKL

, FKL, PKF

PKF

= PKF \ PKl

del

∀q ∈ Q: attrib

(q) = ∅

= A \ Q

= L \ l

del

Out Sn

4 COMPARISON

In the following, the results of Section 3 are sum-

marized. For each evolution operation, we indicate

the “ease of implementation” and “ease of updating”

of the instances of the schemas. If the adaption task

can be accomplished with a few SQL statements, it is

“simple,” otherwise it is “hard.” We also summarize

which evolution operations can result in invalidating

browse and aggregate queries. The summarization is

seen in the tables below.

Ease of implementation Star Snowﬂake

schema schema

Insert attribute into level Simple Simple

Delete attribute from level Simple Simple

Insert dimension level Simple Hard

Delete dimension level Simple Hard

Insert measure attribute Simple Simple

Delete measure attribute Simple Simple

Insert dimension into fact Hard Hard

Delete dimension from fact Hard Hard

We see that in all cases the star schema is either

simpler or just as hard to update as its snowﬂake coun-

terpart. Especially the operations “Insert dimension

level” and “Delete dimension level” are far simpler

SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES

431

Ease of updating Star Snowﬂake

schema schema

Insert attribute into level Simple Simple

Delete attribute from level Simple Simple

Insert dimension level Simple Hard

Delete dimension level Simple Hard

Insert measure attribute into fact Simple Simple

Delete measure attribute from fact Simple Simple

Insert dimension into fact Hard Hard

Delete dimension from fact Hard Hard

Invalidates browse queries Star Snowﬂake

schema schema

Insert attribute into level No No

Delete attribute from level Yes Yes

Insert dimension level

- New top dimension level No No

- Between dimension levels No Yes

- New base dimension level No No

Delete dimension level

- Delete top dimension level Yes Yes

- Between dimension levels Yes Yes

- Delete base dimension level Yes Yes

Insert measure attribute No No

Delete measure attribute No No

Insert dimension into fact No No

Delete dimension from fact Yes Yes

Invalidates MIN, MAX, COUNT Star Snowﬂake

and AVG aggregate queries schema schema

Insert attribute into level No No

Delete attribute from level Yes Yes

Insert dimension level

- New top dimension level No No

- Between dimension levels No Yes

- New base dimension level Yes Yes

Delete dimension level

- Delete top dimension level Yes Yes

- Between dimension levels Yes Yes

- Delete base dimension level Yes Yes

Insert measure attribute No No

Delete measure attribute Yes Yes

Insert dimension into fact Yes Yes

Delete dimension from fact Yes Yes

to implement and update in the star schema due to

its denormalized dimensions. When all attributes of

a dimension are in the same table there are no pri-

Invalidates SUM aggregate Star Snowﬂake

queries schema schema

Insert attribute into level No No

Delete attribute from level Yes Yes

Insert dimension level

- New top dimension level No No

- Between dimension levels No Yes

- New base dimension level No Yes

Delete dimension level

- Delete top dimension level Yes Yes

- Between dimension levels Yes Yes

- Delete base dimension level Yes Yes

Insert measure attribute No No

Delete measure attribute Yes Yes

Insert dimension level into fact No No

Delete dimension level from fact Yes Yes

mary and foreign key dependencies that have to be up-

dated as in snowﬂake schemas. If we compare the two

schema types’ ability to let existing browse and aggre-

gate queries continue to operate, the star schema again

proves to be superior to the snowﬂake schema. Again

it is the two operations mentioned above that cause

the star schema to prevail. When performing the op-

eration “Insert dimension level” aggregate queries us-

ing MIN, MAX, AVG and COUNT will only fail on

the star schema if the new dimension level is inserted

as a new base dimension and also causing a change

in granularity. The snowﬂake schema, on the other

hand, will fail both for new base dimension levels and

new dimension levels between other dimension levels

because the tables above and below in the dimension

hierarchy (also including the fact table) will no longer

be possible to join (their join-path is broken). When

aggregate queries only use the SUM aggregate func-

tion, the star schema is much more robust than the

snowﬂake schema as it will not matter if the granu-

larity changes when inserting a new base dimension

level since SUM is split-resistant.

5 CONCLUSION AND FUTURE

WORK

Motivated by the popularity of star and snowﬂake

schemas for DW implementation, the lack of detailed

information on schema evolution for such schemas,

and the problems with schema evolution in DW re-

porting environments, this paper systematically inves-

tigated the evolution properties of star and snowﬂake

schemas. We formally deﬁned star and snowﬂake

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

432

schemas, and considered eight evolution operations,

covering insertion and deletion of dimensions, levels,

dimension attributes, and measure attributes. For each

operation, the formal semantics of the changes for

star and snowﬂake schemas were given, and instance

adaption and impact on existing browse and aggregate

queries was described. Finally, the evolution prop-

erties of star and snowﬂake schemas was compared,

looking at the ease of implementation and instance

adaption and which changes cause existing browse

or aggregate queries to fail. In all cases, the star

schema was either superior to the snowﬂake schema

or had the same problems. Thus, the star schema

is considerably more robust towards schema changes

than the snowﬂake schema. This challenges the tradi-

tional relational assumption that normalized schemas

are more ﬂexible. Interesting aspects of future work

include building a tool for supporting implementation

and instance adaption for schema evolution in rela-

tional DWs, as well as a framework/tool for rewriting

queries against old version of the schema to the new

schema, thus easing the problems of DW evolution.

REFERENCES

M. Blaschka, C. Sapia and G. H

oﬂing. On Schema Evo-

lution in Multidimensional Databases. In Proc. of

DaWaK, pp. 153–164, 1999.

M. Blaschka. FIESTA, A Framework for Schema Evolution

in Multidimensional Databases. Ph. D. Thesis, TU

Mnchen, 2000.

M. Body, M. Miquel., Y. Bdard., and A. Tchounikine.

A Multidimensional and Multiversion Structure for

OLAP Applications. In Proc. of DOLAP, 2002.

M. Bouzheghoub and Z. Kedad. A Logical Model for Data

Warehouse Design and Evolution. In Proc. of DaWaK,

pp. 178–188, 2000

J. Eder and C. Koncilia. Changes of Dimension Data in

Temporal Data Warehouses. In Proc. of DaWaK,

2001.

C. A. Hurtado, A. O. Mendelzon, and A. A. Vaisman. Up-

dating OLAP Dimensions. In Proc. of DOLAP, 1999.

R. Kimball. The Data Warehouse Toolkit. Wiley, 1996.

R. Kimball et al. The Data Warehouse Lifecycle Toolkit.

Wiley, 1998.

C. Letz, E. Henn, and G. Vossen. Consistency in Data Ware-

house Dimensions. In Proc. of IDEAS, 2002.

M. Levene and G. Loizou. Why is the Snowﬂake Schema a

Good Data Warehouse Design?. Information Systems,

28(3):225–240, 2003.

S. Lujan-Mora, J. Trujillo, and I-Y. Song. Multidimensional

Modeling With UML Package Diagrams. In Proc. of

ER, pp. 199-213, 2002.

A. Mendelzohn and A. A. Vaisman. Temporal Queries in

OLAP. In Proc. of VLDB, pp. 242–253, 2000

T. Morzy and R. Wrembel. Modeling a Multiversion Data

Warehouse: a Formal Approach. In Proc. of ICEIS,

2003.

T. B. Pedersen, C. S. Jensen, and C. E. Dyreson. A Foun-

dation for Capturing and Querying Complex Multidi-

mensional Data. Information Systems 26(5):383–423,

2001.

T. B. Pedersen and C. S. Jensen. Multidimensional

Database Technology. IEEE Computer34(12):40–46,

2001.

J. Roddick. Schema Evolution in Database Systems - An

Annotated Bibliography. SIGMOD Record 21(4):35–

40, 1992.

C. Sapia, M. Blaschka, G. H

oﬂing and B. Dinter. Extending

the E/R model for the Multidimensional Paradigm. In

Proc. of ER Workshops, pp. 105–116, 1998.

P. Vassiliadis. Gulliver in the land of data warehousing:

practical experiences and observations of a researcher.

In Proc. of DMDW, pp. 12-1–12-16, 2000.

P. Vassiliadis, M. Bouzheghoub, and C. Quix. Towards

Quality-Oriented Data Warehouse Usage and Evolu-

tion. Information Systems 25(2):89–115, 2000.

SCHEMA EVOLUTION FOR STARS AND SNOWFLAKES

433