
a way that would allow efficient and modular inte-
gration of various legacy systems into a ”virtual data
warehouse”. This modular integration would also al-
low replacing these legacy systems independently of
each other. This would make it possible to optimise
the components in their local environment without
limiting their relevance as sources of information for
strategic decision making.
Some recent developments in the design and pro-
totyping tools are addressing these problems by mak-
ing it possible to centralize the information about the
corporate data sources and prototype the system that
would result from the use of the planned ETL (Ex-
traction, Transformation, Loading) process. The pro-
posed approach will push these ideas even further
by making it possible to iterate these prototypes in
quick succession with the real transactional data, and
then put the final, “accepted” version into production.
This is done by leveraging Grid Data access proto-
cols, which make possible hiding the details of the
local storage systems from the ETL process.
Another advantage of the use of the Grid technol-
ogy comes from its ability to create “Virtual Organi-
zations” from groups of entities that belong to differ-
ent security domains. By mapping the local identities
(user accounts, server certificates and other entities
managed by the corporate security infrastructure) to
Grid certificates, it is possible to integrate data from
sources that are not controlled by a single entity. Data
access and security features make the proposed ap-
proach especially interesting in cases like managing
the information systems in recently merged compa-
nies - or establishing an extended enterprise type col-
laboration with strategic partners.
The basic idea behind the virtual data warehouse
is to construct the analysis database (i.e. an OLAP
cube) on demand, and only include the data that are
needed for the analysis at hand. Since the data can be
gathered directly from the operational data sources,
it is always up-to-date and suitable for near-realtime
analysis of emerging trends in the company and its
environment. The most difficult problem on-demand
construction of OLAP cubes is the extraction of data
from operational databases. We offer the following
three approaches to deal with the problem.
1. In most cases, only a small subset of data stored in
OLTP systems is fetched to the OLAP cube. This is
possible since we aim at building the OLAP cube to
solve some specific problem. In the traditional data
warehouse approach the OLAP cube is constructed
for general purpose analyses.
2. Grid technologies facilitate dynamic allocation of
computing power from larger resource pool for
ETL processing. While it is likely that OLTP sys-
tems themselves are more efficient now than cou-
ple of years ago, the development does not in itself
help ETL phase, since the OLTP systems are of-
ten legacy systems that will not be updated very of-
ten. Furthermore, the amount of data and the users
of the OLTP system will usually increase in phase
with the system capacity.
3. The selection of the relevant data for the analysis
can be done remotely in the operational databases
before shipping the data for analysis. Agent tech-
nology can partially perform the local data selec-
tion and aggregation to decrease the need of the
processing power of the local database servers.
Business intelligence systems, like OLAP, have
traditionally been limited to the data stored in the
data warehouses or some other well-defined, struc-
tured databases. There can be cases where this pre-
defined set of data sources is not enough, since the
phenomenon under analysis can depend on something
outside the scope of the company. For example, the
oil price or the weather can have a remarkable effect
on business through some complex cause and effect
chain. If the rules that are used to test scenarios are
limited to the data that is in the corporate data ware-
house, the analysis cannot find all the possible expla-
nations for a phenomenon. Virtual data warehouse
methodology enables the user to include external data
to the OLAP cube through Grid Data access.
The motivation to use Grid technologies in the im-
plementation is related to the capacity of the Grid
frameworks to provide enough secure computing and
storage capacity on demand to handle much larger
datasets than traditional systems with similar costs.
This kind of use of parallel processing and shared
computing resources requires a strong, universally ac-
cepted security infrastructure that is used to access the
computing resources of external service provider (this
type of Grid can be seen as a advanced ”Utility Com-
puting” solution). In addition to Grid technologies,
we use XML language with XSL transformations for
data source integration (The World Wide Web Con-
sortium, 1999).
2 RELATED WORK
Zurek and Sinnwell (Zurek and Sinnwell, 1999) have
studied how changes in companies should reflect to
data warehouses. They have noticed that companies
tend to change their organization often, necessitating
the realignment of data warehouse schema quite fre-
quently. Especially the required dimensions of data
and their hierarchies can change even more frequently
than the organization itself.
Sypherlink is one of the companies advertising
more flexible prototyping and ETL tools for data
warehousing systems (Sypherlink, 2003). This solu-
A FRAMEWORK FOR ON-DEMAND INTEGRATION OF ENTERPRISE DATA SOURCES
605