critical issue. Additionally, the real-time enterprise
requires data to be always up to date.
DW refreshment (integration of new data) is
traditionally performed in off-line fashion, implying
that while processes for updating the data area are
executed, OLAP users and applications cannot
access any data. This set of activities takes place in a
loading time window, usually during the night, in a
daily, weekly or even monthly basis, to avoid
overloading the operational OLTP source systems
with the extra workload of this workflow. Active
Data Warehousing refers a new trend where DWs
are updated as frequently as possible, due to high
demands of users for fresh data. Real-Time Data
Warehousing (RTDW) is also referred for that
reason in (White, 2002). The conclusions presented
from a knowledge exchange network formed by
major technological partners in Denmark (Pederson,
2004) refer that all partners agree real-time
enterprise and continuous data availability is
considered a short term priority for all business and
general data-based advents.
In a nutshell, accomplishing near zero latency
between OLTP and OLAP systems consists in
insuring continuous data integration from the first
type of systems to the other. To make this feasible,
several issues need to be taken under consideration:
(1) Operational OLTP systems are designed to meet
well-specified (short) response time requirements,
meaning that a RTDW scenario would have to cope
with the overhead implied in those OLTP systems;
(2) The DW tables directly related with transactional
records (commonly named as fact tables) are usually
huge in size, and therefore, addition of new data and
consequent operations such as index updating would
certainly have impact in OLAP systems’
performance and data availability. Our work focuses
on the DW perspective, presenting an efficient
methodology for continuous data integration ETL
loading process and techniques on how to adapt the
DW’s schemas for supporting continuous data
integration and adapting OLAP queries for using all
the integrated data.
The remainder of this paper is as follows. In
section 2, we refer background and related work in
real-time data warehousing. Section 3 explains our
methodology, and in section 4 we present an
experimental evaluation and demonstrate its
functionality. The final section contains concluding
remarks and future work.
2 RELATED WORK
The DW needs to be updated continuously to reflect
source data updates. DW users are often not only
interested in monitoring current information, but
also in analyzing the history to predict future trends.
Therefore, real-world DWs are often temporal, but
their temporal support is implemented in an ad doc
manner that is difficult to automate. In practice,
many operational source systems are nontemporal,
i.e., they store only the current state of their data, not
the complete history. So far, research has mostly
focused on the problem of maintaining the
warehouse in its traditional periodically update setup
(Yang, 2001B) (Labio, 2000). In a different line of
research, data streams (Abadi, 2003) (Babu, 2001)
(Lomet, 2003) (Srivastava, 2004) appear as a
potential solution. Nevertheless, research in data
streams has focused on topics concerning the front-
end, such as on-the-fly computation of queries
without a systematic treatment of the issues raised at
the back-end of a DW (Karakasidis, 2005). Much of
the recent work dedicated to RTDW is focused on
conceptual ETL modelling (Vassiliadis, 2001)
(Bruckner, 2002A) (Bouzeghoub, 1999) (Simitsis,
2005), lacking the presentation of specific
extraction, transformation and loading algorithms
along with their consequent OLTP and OLAP
performance issues. Our contribution is the
presentation of a methodology which efficiently
enables continuous data integration in the DW and
aims to minimize its negative impact in OLAP end
user query workload executions. The issues focused
in this paper concern the DW end of the system,
referring how to perform the loading processes of
ETL procedures and the DW’s data area usage for
efficiently supporting continuous data integration.
Extracting and transforming of operational (OLTP)
source systems data are not the focus of this paper.
In (Bouzeghoub, 1999) the authors describe an
approach which clearly separates the DW
refreshment process from its traditional handling as
a view maintenance or bulk loading process. They
provide a conceptual model of the process, treated as
a composite workflow, but they do not describe how
to efficiently propagate the date. In (Vassiliadis,
2001), authors describe ARKTOS ETL tool, capable
of modeling and executing practical ETL scenarios
by providing explicit primitives for capturing
common tasks (such as data cleaning, scheduling
and data transformations). ARKTOS uses a
declarative language, offering graphical and
declarative features for defining DW
transformations optimizes execution of complex
ICEIS 2007 - International Conference on Enterprise Information Systems
590