sequence...). Processing the data thus turns out to
process their descriptors. Original data are stored,
for instance as binary large objects (BLOBs), and can
also be exploited to extract information that could en-
rich their own caracteristics (descriptors and meta-
data).
The architecture framework we propose for com-
plex data warehousing (Figure 2) exploits the XML
language. Using XML indeed facilitates the inte-
gration of heterogeneous data from various sources
into the warehouse; the exploitation of metadata
and knowledge (namely regarding the application do-
main) within the warehouse; and data modeling and
storage. The presence of metadata and knowledge
in the data warehouse is aimed at improving global
performance, even if their actual integration is still
the subject of several research projects (McBrien and
Poulovassilis, 2001; Baril and Bellahs
`
ene, 2003; Shah
and Chirkova, 2003).
This architecture framework is essentially made of:
the data warehouse kernel, which may be either ma-
terialized as an XML warehouse, or virtual (where
cubes are computed at run time); data sources; source
type drivers that notably include mapping specifica-
tions between the sources and XML; and a metadata
and knowledge base layer that includes three submod-
ules related to three management processes.
The three processes for managing a data warehouse
are: the ETL and integration process that feeds the
warehouse with source data from operational data-
bases (DS Op) by using drivers that are specific to
each source type (ST ); the administration and mon-
itoring process (MD&KR) that manages metadata
and knowledge (the administrator interacts with the
data warehouse through this process); and the analy-
sis and usage process that runs user queries, produces
reports, builds data cubes, supports OLAP, etc. Each
of these processes exploits and updates the metadata
and the knowledge base. There are four types of
flows: the external flow, which includes the ETL and
integration flow and the exploitation (analysis and us-
age) flow (the warehouse may thus be considered as
a black box); the internal flow, between the ware-
house kernel and the metadata and knowledge base
layer and between the metadata and knowledge base
layer and the source type drivers; the metadata and
knowledge management and maintenance flow, which
acquires new knowledge and enriches existing knowl-
edge; and the reference flow, which illustrates the fact
that the external flow always refers to the metadata
and knowledge base layer for integration, ETL, and
analysis and usage in general.
Note that analysis results under the form of cubes,
reports, queries, or any other intermediary results may
constitute new data sources (DS Res) that may be
reintegrated into the warehouse.
Though our proposal is only an architecture frame-
work, it helps us formalizing the warehousing process
of complex data as a whole. Thus, we are able to iden-
tify the issues to be solved. We can also point out the
great importance of metadata in managing and ana-
lyzing complex data. Furthermore, piloting and syn-
chronizing the data warehouse processes we identify
in this framework is a whole problematic in itself. Op-
timization techniques will be necessary to achieve an
efficient management of data and metadata. Commu-
nication techniques, presumably based on known pro-
tocols, will also be needed to build up efficient data
exchange solutions.
4 CONCLUSION AND
PERSPECTIVES
We addressed in this paper the problem of warehous-
ing complex data. We first clarified the concept of
complex data by providing a precise, though open,
definition of complex data. Then we presented a
general architecture framework for warehousing com-
plex data. It heavily relies on metadata and domain-
specific knowledge, which we identify as a key ele-
ment in complex warehousing, and rests on the XML
language, which helps storing data, metadata and
knowledge, and facilitates communication between
the various warehousing processes. This proposal
takes into account the two main possible families of
architectures for complex data warehousing (namely
virtual data warehousing and centralized, XML ware-
housing). Finally, we rapidly presented the main is-
sues in complex data warehousing, especially regard-
ing data integration, the modeling of complex data
cubes, and performance.
This study opens many research perspectives. Up
to now, our work mainly focused on the integration
of complex data in an ODS. Though we also worked
on the muldimensional modeling of complex data,
this was our first significant advance into the actual
warehousing of complex data. In order to test and
refine our hypotheses in the field, we plan to apply
our proposals on three different application domains
we currently work on (medicine, banking and geogra-
phy). Such practical applications should help us de-
vise solutions about the many issues regarding meta-
data management and performance, and experiment
both the virtual and XML warehousing solutions.
One of our important perspectives deals with the
selection of a representation mode for metadata and
domain-specific knowledge. Knowledge related to
the application domain is actually an operational in-
formation about complex data. It may be considered
as metadata. In order to remain in the XML-based,
homogeneous environment of our architecture frame-
work, the formalisms that seem best-fitted to repre-
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
372