heterogeneity problems; structural heterogeneity is
resolved at the wrapper level and semantic
heterogeneity is homogenized by the mediators.
Examples of such approaches are the TSIMMIS
(Garcia-Molina et al, 1997) and Florid (Ludäscher et
al, 1998) integration systems. TSIMMIS implements
virtual (or logical) integration, meaning that the data
stays in the sources and is delivered to the user on
request only. Florid follows the materialized
integration approach: the data from the local sources
is integrated and materialized so that the global
query is directly evaluated on the integrated set of
data.
The Garlic (Carey et al, 1995) integration
architecture is similar to those described above, with
the difference that no mediators are used. Instead,
the integration and query transformation are
performed centralized by the ‘Query Services and
Runtime System’ component.
Another strategy is to develop a special mapping
language. Such languages, like BRIITY (Härder et
al, 1999), allow the definition of mapping rules
which, in turn, determine the interoperability
between the global schema and the local schemas.
Heterogeneity conflicts can be solved explicitly by
coding appropriate integration rules.
Recently, various data integration strategies have
been developed for the interoperability of XML and
RDBMSs. They focus on using a relational database
management system to store and query XML data:
Either an RDBMS is used to store and query XML
data, or existing relational data is presented as an
XML view to the user or application. Commercial
solutions used by object-relational database
management systems (as Oracle 9i (Higgins et al,
2002), IBM DB2 (IBM Corporation, 2002),
Microsoft SQL Server 2000 (Microsoft Corporation,
2004)) provide various mechanisms for mappings
between relational tables and XML fragments, but
they do not provide schema integration: The user
still has to know both schema definitions (no global
schema is created), to use two query languages, and
to perform combining and cleaning of query results
manually.
Among the various research approaches for
XML Publishing, we mention SilkRoute, XPeranto
and Agora here. SilkRoute (Fernandez et al, 2000)
and XPeranto (Shanmugasundaram et al, 2001)
focus on defining XML views on relational data and
evaluating XML queries by decomposing the view.
In both approaches, a virtual XML view is created
and then the XML queries (XML-QL in SilkRoute
and XQuery in XPeranto) are evaluated
on this view. These approaches use only a single
local relational data source, and their main task is to
process XML queries on it.
The Agora (Manolescu et al, 2001) approach
focuses on the problem of translating XQuery
queries into SQL. Unlike SilkRoute and XPeranto, it
can handle relational as well as XML data sources.
In contrast to SQXML, Agora uses the local-as-view
(LAV) approach (Halevy, 2000) and supports only
one language, XQuery.
Integration solutions like TSIMMIS, Garlic, and
BRIITY are more generic with respect to the data
sources that can be integrated, and considerable
efforts would be required to adapt these approaches
to support SQL:1999 and XML Schema. Also,
considerable programming efforts would be required
to code wrappers and mediators or to define the
mapping rules.
In contrast, the SQXML approach resolves the
structural heterogeneity between SQL:1999 and
XML Schema fully automatically: With the SQXML
metamodel (Section 4), SQL:1999 schemas and
XML Schema definitions can be directly
transformed into uniform representations. The
semantic heterogeneity is resolved in a near-
automatic way, only possibly requiring some manual
changes and improvements to the mapping model
during the schema matching process.
The SQXML system is aimed at providing the
user or the application with bilingual access, i.e., it
supports both query languages, SQL and XQuery. In
contrast, related approaches define a new language
(e.g., Lorel in TSIMMIS or F-Logik in Florid) or use
SQL with appropriate extensions (e.g., object-
oriented extensions of SQL in Garlic) to provide
access to the integrated data.
None of the integration systems mentioned above
supports more than one query language, and most of
the approaches require significant user support
during the integration process. The SQXML
Integration System as proposed in this paper is
aimed at simplifying and automating the integration
process as well as providing efficient data access.
8 CONCLUSIONS AND FUTURE
WORK
This paper has presented SQXML, a system
designed to implement the integration of XML and
(object-)relational data sources. SQXML provides
new features that have not been available in other
integration systems. It aims at providing near-
automatic performance, that is, user interaction is
limited to the process of resolving semantic conflicts
between the schemas. Structural heterogeneity
between the schemas is resolved fully automatically.
To unify SQL:1999 and XML Schema, concepts
of the Common Warehouse Metamodel have been
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
42