
Legacy data
Transformation Transformation Database
XML
Transformation
Figure 3: Transformation Sequence for Legacy System Integration.
6.1 Legacy System Integration
In the application scenario we take the data from
the legacy system and import the data into a central
database.
The transformation system applies a sequence con-
sisting of two transformations. The first transforma-
tion converts the legacy data into XML format while
verifying the data during the SGD subtransformation.
The second transformation parses and processes the
data and imports them into the database.
6.1.1 Token-XPath Matrix
The first transformation converts the legacy system’s
proprietary data format into an intermediate format.
In the SGD subtransformation, the scanner and
parser are generated and perform a syntax check on
the source data. The semantic verification (in our ap-
plication scenario suppression of duplicate data en-
tries in the source data) is manually programmed and
integrated into the generated parser.
The CD subtransformation determines where data,
originating from a token produced by the scanner
component and semantically checked and converted
during the semantical analysis, is inserted into the
resulting XML document, serving as an intermedi-
ate data format in the transformation sequence. This
is performed applying a Token-XPath-Assigment ma-
trix (TXPA matrix) M
T X
= T × X, which consists of
the tokens symbols T of the source data grammar and
the target data grammar XML elements, expressed as
XPath elements X.
The target grammar is presented as a XML
Schema. The target grammar driven subtransforma-
tion is generated using JAXB (SUN Microsystems,
2003), which generates a suite of hierarchical classes
that produces an XML document complying with the
XML Schema. This suite of classes is subsequently
used by the CD and the TGD transformation. They
represent an interface that both transformations apply
in cooperation using introspection.
The intermediate (CD) subtransformation decou-
ples the source and the target grammar driven sub-
transformation (Figure 4). If the source or the tar-
get grammar is modified or the semantic analysis
changes, only the TXPA matrix needs to be adapted.
This makes the transformation system flexible and ro-
bust in the case of changes.
6.1.2 XPath-Database Configuration
The second transformation imports the data from the
XML document into a database. Most databases al-
low importing XML data, or comma-separated vaue
lists. However, data can only be inserted into a single
table, and most often this data requires further pro-
cessing such as splitting the data and distributing the
data among several database tables.
The SGD transformation is accomplished employ-
ing an XML parser. The CD and the TGD transforma-
tions use OJB (The Apache DB Project, 2003). OJB
generates a set of classes on the basis of a database de-
sign allowing transparent persistent mapping of Ob-
jects against relational databases. It allows storing
objects, or part of an object in relational databases,
and reading data from a relational database into the
generated object structure.
The grammar oriented transformation needs to re-
work the data from an XML into a OJB object repre-
sentation. The OJB object structure is then imported
into the database (Figure 5).
The objective of the CD transformation is to remain
independent from the grammar of the source XML
document and the target configuration of the database.
We need to take into account the following require-
ments:
1. Specification of a mapping between XML elements
and OJB objects.
2. Instantiation of OJB objects creating a new dataset.
3. Relations between the OJB objects.
4. Processing of duplicate datasets. Duplicates are al-
ready filtered out in the first transformation. How-
ever, at this stage we cannot detect duplicates,
which might occur during the reordering of the data
in the second transformation, nor can we detect du-
plicates that are already in the database.
5. Declaration of an import sequence to prevent pri-
mary key violation.
We have developed XML2OJB, a mapping from
XML documents to OJB object structure (Ap-
pendix A). It allows flexible, adaptable, and inde-
pendent import of arbitrary structured XML data into
arbitrary database table configuration.
Appendix A shows part of an example where an
XML address list is inserted into a database. The
XML2OJB configuration is divided into five parts.
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
206