<!ELEMENT root (name,publ)*>
<!ELEMENT publ (year,(book|article)+)*>
<!ELEMENT book (title,ISBN, price)>
<!ELEMENT article (title,journal,issue?,page)>
(a) A Simplified DTD example
EN = {root, name, publ, year, book, article, title, ISBN,
price, journal, issue, page}
G = {Str, [name, publ]
∗
, [year, [book|article]
+
]
∗
, [title,
ISBN, price], [title, journal, issue
?
, page]}
β(root) = [name, publ]
∗
, β(publ) = [year, [book|article]
+
]
∗
,
β(book) = [title, ISBN, price],
β(article) = [title, journal, issue
?
, page],
β(year) = β(name) = β(title) = β(ISBN) = β(price) =
β( journal) = β(issue) = β(page) = Str.
(b) The tuple expression of the DTD example
Figure 2: A DTD example and its tuple expression.
respectively. The operations of two multiplicities c
1
and c
2
are conducted relying on the operations of their
intervals. Thus, c
1
⊕ c
2
(= c
1
c
2
) has the semantics of
the multiplicity whose interval encloses the intervals
of c
1
and c
2
, e.g., +? = ∗ and 1? =?. Similarly, c
1
c
2
is the multiplicity whose interval equals to the inter-
val of c
1
taking that of c
2
and adding that of 1, e.g.,
?? = 1 and ∗ + =?.
3.2 Document Representation
A well-formed XML document is a textual rep-
resentation of data and is composed of elements
with hierarchically nested structures as defined in
its corresponding DTD. In our methodology, a doc-
ument is represented by a series of trees T = (e :
val T
1
T
2
··· T
m
), where T
1
T
2
··· T
m
are recursively
defined child trees of T , e : val is the root of T and
the parent of T
1
T
2
··· T
m
, and e ∈ EN, val is a text
string if the root node contains a value, otherwise, it
is omitted. Figure 3 is an example showing an XML
document represented by such trees.
3.3 Hedge and Hedge Conformation
A hedge H is a sequence of trees under one
node in a document. For instance, in the doc-
ument shown in Figure 3, T , T
1
T
2
, T
3
T
4
T
5
, and
(title:ABC)(ISBN:-345)(Price:50) are four hedges. A
hedge may contain smaller hedges. Here our in-
terest is in which child trees of a node belong to
a hedge conforming to a specific type construc-
tor. For example, let g
+
= [A, [B, C
?
]
∗
, D
?
]
+
, β(e) =
g
+
and T =(e((A)(B)(B)(C)(A)(B)(C)(D))), then hedge
(A) conforms to [A], hedge (B)(B)(C) conforms to
[B, C
?
]
∗
, hedge (A)(B)(B)(C) conforms to g, and hedge
(A)(B)(B)(C)(A)(B)(C)(D) conforms to g
+
. A hedge H
conforms to g is denoted by H
g
.
By using the hedge notation, the child trees of a
node can be logically split and thus, the cardinality
constraints of a structure can be checked.
<root>
<name>M. Fox</name>
<publ>
<year>2006</year>
<book><title>ABC</title><ISBN>-345</ISBN> <price>50</price></book>
<book><title>DEF</title><ISBN>-302 </ISBN><price>120</price></book>
<year>2005</year>
<book><title>XYZ</title><ISBN>-145</ISBN> <price>180</price></book>
<article><title>FGH</title><journal>J1</journal><issue>2</issue><page>55-58
</page></article>
<year>2004</year>
<article><title>XXX</title><journal>J2</journal><page>20-24</page></article>
<article><title>T8</title><journal>J1</journal><issue>2</issue><page>8-15
</page></article>
</publ>
<name>K. Page</name>
<publ>
<year>2006</year>
<book><title>YYY</title><ISBN>-452 </ISBN> <price>200</price></book>
<year>2004</year>
<book><title>ZZZ</title><ISBN>-223</ISBN> <price>220</price></book>
<year>2003</year>
<article><title>GG</title><journal>J2</journal> <page>75-80</page></article>
<book><title>TTTT</title><ISBN>-243</ISBN><price>180</price></book>
</publ>
</root>
( a ) A simplified document example
T = (root: T
1
T
2
)
T
1
= ((name: “M. Fox”) (publ: (T
3
T
4
T
5
))), T
2
= ((name: “K. Page”) ··· )
T
3
= ((year: “2006”) (book: ···))
T
4
= ((year: “2005”) ···), T
5
= ((year: “2004”) ···)
······
( b ) The document trees for the example
Figure 3: A document example and its tree expression.
4 XML DATA
TRANSFORMATION
OPERATIONS
In the proposed methodology, the transformation of
XML data is implemented through executing a series
of data transformation operations against the tuple ex-
pression and the document trees. The data transfor-
mation operations are defined by a set of operators.
Because of the syntax differences between DTD and
document, each operator has defined two parts: one
for transforming the DTD and the other for transform-
ing its conforming documents. The formal definition
of each operator has been presented in (Liu et al.,
2006). This section will provide an overall descrip-
tion of the data transformation operations defined by
those operators.
The DTD transformation operation that each opera-
tor performs is listed in Table 1. Because the doc-
ument transformation operation of each operator is
to convert a given document into one with a struc-
ture that conforms to the transformed DTD, the ta-
ble also reveals the information for what transfor-
mation operation will be carried out on the con-
forming document by each operator. For instance,
unnest operator converts the DTD β(e) = [g
1
, g
+
2
]
into a new DTD β
1
(e) = [g
1
, g
2
]
+
. The operator
also transforms the document by converting the hedge
AN IMPLEMENTATION OF XML DATA INTEGRATION
113