into XML, including their contributions and
drawbacks. Our DWG2XML method is presented in
Section 3. Section 4 is the conclusions.
2 OVERVIEW OF THE EXISTING
APPROACHES
2.1 Reverse engineering approach
Alhajj (2003) presents a reverse engineering
approach that extracts the entity-relationship (EER)
schema from the relational schema. The concepts
and mechanism provided contribute to legacy
database maintenance, re-engineering or updating to
another database technique. Based on the analysis of
the relationships between tables in a legacy
database, a relational intermediate directed (RID)
graph consistent with the EER diagram is derived to
express all possible unary, binary and nary
relationships between the given relations. Then, it
develops algorithms to eliminate the symmetry and
transitivity in RID, if exist. It also identifies is-a
links in the RID graph to deliver an optimized RID
as the final outcome, which can be used to derive the
XML schema. Such a conversion approach has been
implemented by Wang, et al (2004). Then they
translate the RID graph into XML schema in a
process called forward engineering. A flat XML
schema is automatically derived from the RID
graph. Our DWG2XML approach can be easily seen
as an extension to complete RID to nested XML
schema translation; this is all described in Section 3.
2.2 CoT and NeT
Lee, et al (2002a; 2002b; 2001) proposed an
approach for creating both flat and nesting XML
structures from the relational database schema. The
Flat Translation (FT) converts each table into a flat
element structure. The Nesting-based Translation
(NeT) derives nested structures from a flat relational
model by the use of the nest operator. This nest
operator process is applied to a single table at a time
and it can create nested structures only for non-
normalized tables in normalized databases. Net is
useful to decrease data redundancy in non-fully
normalized relational databases. But it only works
on tables one by one and depends on the relational
schema as well as the actual data stored in the
database.
Then Lee et al extended the nesting approach to
multiple tables, using Constraints-based Translation
(CoT) algorithm. It is one of the first approaches
that deal with relationships. The source database
contains several interconnected tables and based on
the cardinality of the binary relationships, two types
are identified one-to-one (1:1) and one-to-many
(1:M). A directed Inclusion Dependency (IND)
Graph of tables is created from which an empirical
way to nest XML structures is identified. However,
a table can only have one child. If there are more
children relations for a particular parent table, these
relationships are simulated by using reference key
expression.
2.3 ConvRel and Conv2XML
Conv2XML and ConvRel are two algorithms
proposed by Duta, et al (2004) for converting
relational schema to XML Schema, focusing on
preserving the source relationships and their
structural constraints.
ConvRel analyzes each type of relationship and
determines a set of candidate XML structures
capable of representing the analyzed relationship
type. The possible XML structures are classified as
Parent-Child, Child-Parent nested structures, flat
structure using keyref references and combination
nested with keyref structure. Those structures are
filtered depending on criteria such as the nested and
compact structure, and the size of XML data file.
ConvRel classifies each type of possible relationship
in the database into the best XML structure spot. But
this approach only works with a single relationship
at a time; it is not applicable for relationships
involving more that two tables.
Conv2XML algorithm extends ConvRel to
create a nested structure for the entire database. It
uses a graph representation that combines all
structures discovered previously in ConvRel. In this
graph, the vertices are tables and edges represent
connections between tables as defined by ConvRel.
Two categories of edges exist in this directed graph:
1) full edges representing nested structures; and 2)
dotted edges representing relationships for the
reference key. The ConvRel algorithm is thereby
transformed into the problem of discovering trees in
a directed graph.
Compared to the NeT and CoT approach,
ConvRel and Conv2XML approach solved the unary
relationship problem between tables. It also can
present multiple tables as a tree structure. However,
from the directed graph, there exist different nested
tree structures. The method proposed by Duta et al
is depth-first algorithm, which ends up with only
one tree structure solution. As a result, DWG2XML
as described in this paper is more comprehensive; it
considers all possible tree structures instead.
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
20