
dimension structures when EDSs' data reflected in
dimensions change.
Paper organization. The rest of this paper is
organized as follows. Section 2 overviews related
work in the area of DW maintenance under schema
and data changes in EDSs. Section 3 briefly presents
our concept a multiversion data warehouse. Section
4 discusses our concept of transactions in a DW.
Section 5 presents the metamodel of our
multiversion data warehouse. Finally, Section 6
concludes the paper.
2 RELATED WORK
The existing approaches to propagating changes
from EDSs to a DW can be classified into two
categories: (1) data refreshing and (2) handling
changes in a DW schema.
The solutions in the first category incrementally
refresh DW fact table using different mechanism for
avoiding duplication anomaly. The ECA algorithm
(Zhuge, Garcia-Molina, Hammer, Widom, 1995),
removes an error term by applying so called
compensating queries. Two extensions of the basic
ECA algorithm, namely ECA
K
and ECA
L
, are able to
process some data source modifications locally at
DW, i.e. without sending maintenance queries to
EDSs. The same idea is used in the Sweep algorithm
(Agrawal, El Abbadi, Singh, Yurek, 1997). Next
solution, i.e. the Strobe algorithm (Zhuge, Garcia-
Molina, Wiener, 1996), stores the list of EDSs’
updates reported to DW during maintenance query
execution. This list, called an action list, is used for
compensating an error term. (Mostefaoui, Raynal,
Roy, Agrawal, 2002) propose an architecture where
EDSs form a ring. The process of finding delta
caused by EDSs' updates is based on exchanging a
token among EDSs. None of above solutions uses
transactional refreshing a DW. In a consequence the
atomicity and isolation of the refreshing process
cannot be guaranteed. Moreover, the above
approaches focus on only those changes in EDSs'
data that do not have any impact on a DW schema.
The only transactional solution to the problem
of incremental DW refresh is, to the best of our
knowledge, (Chen, Chen, Rundensteiner 2000). The
authors propose a special purpose transaction, called
DWMS_Transaction, which covers the whole
process of a DW fact table refreshing. The
DWMS_Transaction has been defined as a sequence
of two transactions, namely local EDS update
transaction and its corresponding DW maintenance
transaction. The main contribution of the reported
work is an observation that the anomaly during the
process of incremental refreshing can be mapped
into the problem of guaranteeing the serializability
of DWMS_Transactions. The authors point out that
DWMS_Transaction is rather conceptual than a real
transaction mechanism, which is the potential
solution's weakness. However, even such conceptual
model of transaction allows to reformulate a
maintenance anomaly problem to well-known "read
dirty data" problem. The compensation techniques
are no longer required. The solution also deals with
schema changes, but does not tackle the problem of
data warehouse dimension structure changes and
concurrent DW users' sessions.
The support for handling changes in a DW
schema was studied in the two following categories:
(1) schema and data evolution, (2) temporal and
versioning extensions. The approaches in the first
category (Koeller, 1998), (Blaschka, 1999),
(Hurtado, 1999a), (Hurtado, 1999b) support only
one DW schema and its instance. When a change is
applied to a schema all data described by the schema
must be converted, that incurs high maintenance
costs.
In the approaches from the second category, in
(Eder, Koncilia, 2001), (Eder, Konicilia, Morzy,
2002), (Chamoni, Stock, 1999), (Mendelzon,
Vaisman, 2000) changes are time stamped in order
to create temporal versions. However, the last two
approaches expose their inability to express and
process queries that span or compare several
temporal versions of data. On the contrary, the
model and prototype of a temporal DW presented in
(Eder, Koncilia, 2001), (Eder, Koncilia, Morzy,
2002) support queries for a particular temporal
version of a DW or queries that span several
versions. In the latter case, conversion functions
must be applied, as data in temporal versions are
virtual.
In (Kang, Chung, 2002), (Kulkarni, Mohania,
1999), (Quass, Widom, 1997), (Rundensteiner,
Koeller, Zhang 2000) implicit versioning in a DW
was proposed. In all of the four approaches, versions
are used for avoiding conflicts and mutual locking
between OLAP queries and transactions refreshing a
DW. Versions are implicitly created and removed by
the system, which is a drawback of these
approaches. On the contrary, (Bellahsene, 1998)
proposes permanent user defined versions of views
in order to simulate changes in a DW schema.
However, the approach supports only simple
changes in source tables and it does not deal either
with typical multidimensional schemas or evolution
of facts or dimensions. Also (Body et al., 2002)
supports permanent time stamped versions of data.
The proposed mechanism, however, uses one central
fact table for storing all versions of data. In a
consequence, the set of schema changes that may be
TRANSACTION CONCEPTS FOR SUPPORTING CHANGES IN DATA WAREHOUSES
293