implicit in the XML hierarchy. We present a
reference relationship scheme in which the XML
data (including all element paths and attributes) are
stored in a more space-efficient way than is provided
by other existing schemes.
The paper also addresses the issue of translation
of data from the RDBMS into equivalent XML
documents without loss of any information. In
particular, translations are achieved without loss of
attributes and element paths so that an XML
document can be translated into relational form, and
then back into XML form with the distinction
between element paths and attributes intact.
The remainder of this paper is organized as
follows: Related work is described in section 2. The
new parent child relationship structure is presented
in section 3. Analysis of storage requirements, the
translation algorithm for conversion from RDBMS
to XML form, and the estimated query time is
described in section 4. The paper concludes with a
discussion and final remarks in section 5.
2 RELATED WORK
The Object Exchange Model (OEM) defines a way
of representing XML data (Abiteboul 1997). An
instance of OEM can be thought of as a graph, with
objects as the vertices and element paths described
using labels on the edges. Each object has a unique
object identifier (oid). (Abiteboul 1997) also
addresses issues to do with querying and
reconstructing semistructured data. An RDBMS
may be used as a basis for storing and querying
XML data (Florescu and Kossman 1999). The XML
document is viewed as an ordered and labelled
directed graph. A node in the graph represents each
XML element; the node is labelled with the oid of
the XML element. Element-sub-element
relationships are represented by edges in the graph
and labelled by the name of the sub-element. The
order of sub-elements is defined by ordering
outgoing edges from nodes in the graph. Values
(e.g. Strings) in an XML document are represented
as leaves in the graph. All edges of the graph are
stored in a relational table called the edge table and
all the values (represented as leaves) are stored in
separate value tables. While the format assists in
performance of XML queries, this graphical form
does not differentiate between element paths and
attributes, or between element paths and references.
Therefore this form of representation is a
simplification, and some information may be lost.
As a consequence it may be impossible to exactly
reconstruct an original XML document from the
relational data form.
XQuery supports XML views of Relational Data
(Shanmugasadaram, Kiernan et al. 2001), providing
a general framework for processing arbitrarily
complex queries. The query language provides a
view composition mechanism that eliminates the
construction of all XML fragments, and an intensive
computation that reduces an XQuery query to SQL
for efficiency of RDBMS manipulation.
ShreX (Du, Amer et al. 2004) provides generic
(mapping-independent) functions for loading
shredded documents into relations and for
translating XML queries into SQL. In this approach,
the annotation processor parses an pre-annotated
XML schema, checks the validity of the mappings
and creates the corresponding relational schema.
Storing and querying XML data using de-
normalized relational databases is described in
(Balmin and Papakonstantinou 2005), which
elaborates a formal framework for XML schema-
driven decomposition that encompass de-normalized
tables and binary-coded XML fragments. The key
performance focus of this approach is the response
time for delivering the first results of a query. At
present this approach does not work on more
complex queries, because the schema model is based
on directed acyclic graphs (DAGs). It is expected
that the technique will be improved when it is
further developed to be based on arbitrary graphs.
XML data can be stably stored as a byte
sequence (BLOB) in columns of tables to support
the XML model (Pal, Cseri et al. 2004). This form
of storage introduces new challenges for query
processing. So-called ORDPATH is used to
preserve structural fidelity, and to allow insertion of
nodes anywhere in the XML tree without the need
for re-labelling existing nodes.
XISS/R is a system based on an extended pre-
order numbering scheme, which captures the nesting
structure of XML data and provides the opportunity
for storage and query processing that is independent
of the particular structure of the data (Harding, Li et
al. 2003). The system includes a web-based user
interface which enables stored documents to be
queried via a query language named Xpath. The
user interface utilizes the Xpath query engine, which
automatically translates Xpath queries into more
efficient SQL statements.
Document Type Descriptors (DTDs) may be
used as a tool in converting XML into Relational
database form (Shanmugasadaram, Krishnamurthy
et al. 2001). After the desired relational schema for
storing XML documents is defined, an XML
STORING SEMISTRUCTURED DATA INTO RELATIONAL DATABASE USING REFERENCE RELATIONSHIP
SCHEME
119