TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY

SYSTEM INTEGRATION

Guido Menkhaus

Software Research Lab, University of Salzburg

Austria

Urs Frei

University of Applied Science, St. Gallen

Switzerland

Keywords:

Legacy System, Integration, Transformation System, Grammar, Middleware.

Abstract:

Most established companies have acquired legacy systems through mergers and acquisitions. The systems

were developed independently of each other and very often they do not align with the evolving IT infrastruc-

ture. Still, they drive day-to-day business processes. Replacing the legacy application with new solutions

might not be feasible, practical or cost a considerable amount of time. However, immediate integration might

be a requirement for a strategic project, such as supply chain management or e-business. This article presents

a transformation system for legacy system integration that allows ﬂexible and effective transformation of data

between heterogeneous systems. Sequences of transformations are described using a grammar based approach.

1 INTRODUCTION

Supply chain management helps companies in con-

trolling the ﬂow of information and goods within their

network of suppliers and customers by providing a

full view on what happens in the network (Hieber,

2002; St”or et al., 2003). But before extending op-

eration management beyond the company’s wall and

integrate companies’ suppliers and customers into a

single information network, the company’s own op-

erations must run smoothly towards cooperation and

collaboration. This involves the integration and inter-

operability of different corporate databases, applica-

tions, and more and more often of legacy systems,

acquired through mergers and acquisitions. These

legacy systems produce structured or semi-structured

data that add to the vast amounts of data that a com-

pany generates every day. This data needs to be com-

municated between heterogeneous systems within the

same company and eventually beyond the company’s

walls. Transformations of communicated data are re-

quired to enable companies to tightly integrate their

systems into a cohesive infrastructure without chang-

ing their applications and systems(DataMirror, 2001).

This article presents a legacy system data inte-

gration middleware that allows ﬂexible and effective

transformation of data between heterogeneous sys-

tems. Our data integration middleware provides a

transformation system in which transformation se-

quences are described based on the grammar of the

format of the source and the target data. It provides

direct integration of applications and systems at the

data level.

The remainder of the article is structured as fol-

lows: The motivation of this work is discussed in

Section 2. Section 3 provides a brief overview about

transformation systems. Section 4 describes strate-

gies for legacy system integration and migration. Sec-

tion 5 illustrates a use-case scenario for the legacy

system data integration middleware. The architecture

of the system is presented and discussed in Section 6.

Section 7 concludes the article with a brief talk about

our future research directions and work.

2 MOTIVATION

For companies to stay competitive, they must be able

to interconnecting seamlessly their database, applica-

tions and legacy systems into an coherent IT infras-

tructure. However, heterogeneous systems including

legacy systems, acquired through mergers and acqui-

sitions, may not exchange data so easily. These sys-

tems produce data in different formats using differ-

ent description languages, such as comma-separated-

value lists or text-based proprietary formats. To com-

municate data from one system using a format A to a

different system using format B, we need to transform

202

Menkhaus G. and Frei U. (2004).

TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY SYSTEM INTEGRATION.

In Proceedings of the Sixth Inter national Conference on Enterprise Information Systems, pages 202-209

DOI: 10.5220/0002618702020209

 SciTePress

Legacy System

Database

IT Infrastructure

System A

IT Infrastructure

System B

Figure 1: Application scenario: Integration of legacy systems.

the data, control that data and ensure that the transfor-

mation from format A to format B is correctly carried

out.

Transforming data is usually done by writing cus-

tom programs (Fiore, 1998). However, if either the

format of the source data or the target data changes,

the custom programs need to be rewritten. Adapting

to frequent changes results in high maintenance costs.

There are systems that allow data integration. For ex-

ample, relational databases allow the integration of

comma-separated-value lists. However, this integra-

tion has limitations. The data is imported into a sin-

gle database table, which need to be further processed

internally, to integrate the data into different tables.

To integrate legacy system data we need middle-

ware that provides the following features:

1. Adaptation: The way data is processed and stored

is diverse and might be subject to changes. If the

format of the source data or the target data in a

transformation sequence changes, quick adaptation

to the transformation sequence is essential to sus-

tain system interconnection.

2. Control: When data is transformed while commu-

nicated between two systems, data might need not

only change the format but the target system might

require the data to change, to be enriched, ﬁltered,

and modiﬁed.

3. Format Guarantee: The transformation sequence

guarantees that the data results in a speciﬁed for-

mat. The speciﬁed target structure of the data is

produced, because the transformation is generated

based on the structure of the target format described

by a grammar.

We present a grammar-based transformation sys-

tem, in which the transformation sequence is gener-

ated originating from a set of grammars describing the

target formats of each transformation step. Semantic

controls need to be programmed manually. The sys-

tem provides means to integrate them into the trans-

formation sequence. Adaptation is accomplished by

respecifying the grammars of the data formats.

3 SHORT OVERVIEW OF

TRANSFORMATION SYSTEMS

Transformation systems transform elements of a

source language into elements of a target language.

The source and the target language can be very dif-

ferent from one another (Winter, 1999). Partsch and

Steinbruggen classify transformation systems into

manual, semi-automatic, and automatic transforma-

tions (Partsch and Steinbruggen, 1983).

• Manual: In manual transformation systems, the

user chooses from a predeﬁned set of transforma-

tions those, which the user wants to apply to the

source language. Manual transformation systems

provide an environment that puts the user in the po-

sition to use transformations more effectively than

the current programming paradigm that requires s

programmer to manually code a transformation.

• Semi-automatic: The objective of semi-automatic

transformation systems is to automate the process

of transforming and to minimize the intervention

of the user. Although the major decisions will still

be made by the user.

• Automatic: The intent of automatic transformation

system is to fully automate the transformation pro-

cess.

The class of problems that can be solved using man-

ual transformation systems is the largest, since most

transformation solutions require insight in the prob-

lem domain and decision taking that is beyond what

automation techniques can do. Semi-automatic sys-

tems need a restricted problem domain where difﬁ-

cult decisions about transformation conﬁguration do

not occur and transformations can be generated auto-

matically. Most limitations are in the automatic trans-

formation system class, where the system selects on

the basis of a knowledge base the transformation se-

quence. However, the system can only be as good as

the programmer has designed the knowledge base.

Restated from a different viewpoint, Partsch and

Steinbruggen divide transformations in (Partsch and

Steinbruggen, 1983) into two types of processes: pro-

cedural and schematic.

TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY SYSTEM INTEGRATION

203

Transformation

System

Engine

Target Data

Grammar

Source Data

Scanner

Parser

Transformation

Generator

Target

Transformation

Generator

Source Data

Semantic Analysis

Source Data

Grammar and

Lexical Analysis

Specification

Transformation

Grammar

Set of

Transformations

Figure 2: Transformation System Architecture.

• Procedural: Procedural transformations specify se-

mantic rules that can be applied globally to the en-

tire source data. They include consistency checks

and analysis tasks.

• Schematic: Schematic transformations are syntax-

oriented and make local changes to the source data.

It should be noted that global, procedural trans-

formations can be accomplished by schematic pro-

cesses, but that the required transformation might be-

come arbitrary complex. Complex rules are better ex-

pressed applying procedural than schematic transfor-

mations (Winter, 1994).

In this paper, we present a transformation system

that is semi-automatic. The automatic part of the sys-

tem is schematic-based and syntax-oriented. The pro-

cedural part of the transformation consists of semantic

analysis and actions, which are applied to the entire

source data.

4 LEGACY SYSTEM

INTEGRATION AND

MIGRATION

Legacy systems are generally deﬁned as ”any in-

formation systems that signiﬁcantly resist modiﬁca-

tion and evolution” (Brodie and Stonebraker, 1995).

Legacy systems still drive day-to-day business pro-

cesses (IBM, 2003). Migrating the application, i.e.

replacing the legacy systems with new solutions,

might not be feasible, practical or costs a considerable

amount of time. The legacy systems may operate in

business critical processes and immediate integration

might be a requirement for a strategic project, such

as supply chain management or e-business. Legacy

system integration deals with accessibility and avail-

ability of data, in a way that legacy systems align with

the new IT infrastructure.

Bateman and Murphy propose the forward and

reverse migration methods for legacy system inte-

gration and eventually migration (Richardson et al.,

1997). We follow their line of argument:

• Forward Migration: Forward migration integrates

the legacy system into the new IT infrastructure be-

fore it attacks its migration. It integrates the legacy

system by transforming and continually importing

legacy data to a relational database. It then in-

crementally migrates the legacy system’s interfaces

and business processes. While the application is

being redeveloped, the legacy system interoperates

with the new IT infrastructure using transforma-

tion oriented middleware that operates as a gate-

way (Wu et al., 1997). The middleware translates

and transforms the legacy system’s data and im-

ports them into the database.

• Reverse Migration: Reverse migration gradually

redevelops the legacy system and integrates the

new applications as soon as they are capable of

partly or completely replacing the legacy system.

During the redevelopment phase, the legacy system

remains operable on the original platform.

Forward migration might results in a longer transi-

tion phase, because migration consists in a migration

and an additional integration step. Reverse migration,

however, blocks further progress in other areas while

the legacy system is being redeveloped. Is progress a

mission critical issue, reverse migration is not an op-

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

204

tion.

The transformation-oriented middleware proposed

in this article is a part of a forward migration approach

for legacy system integration.

5 APPLICATION SCENARIO

The transformation system was designed as middle-

ware for the integration of legacy systems. This sec-

tion outlines in brief a pratice area, which demon-

strates the value of our transformation system.

A company requires integrating a legacy system

into its new IT infrastructure (Figure 1). The legacy

system’s integration is accomplished via a central

database, which is used by various systems. The

legacy system’s output data is in a proprietary format,

and the data needs to be imported into a variety of

database tables. The database conﬁguration was de-

signed with respect to the new IT infrastructure and

the legacy system must adapt and integrate to the new

structure.

In our application scenario, the legacy system pro-

duces data as comma-separated-value lists. This data

is checked and veriﬁed, transformed into an internal

and intermediate XML format, and ﬁnally imported

into the database.

6 ARCHITECTURE OF

TRANSFORMATION SYSTEM

The architecture of the transformation system is il-

lustrated in Figure 2. The transformation system

runs a sequence of transformations, in which source

data complying with a source data grammar is trans-

formed into target data described by the target gram-

mar. The schematic part of each transformation se-

quence is generated using parser and transformation

generating systems. The procedural aspect is manu-

ally programmed and integrated in the schematic part.

Each transformation in a sequence consists of three

intermediate subtransformations:

1. Source Grammar Driven

2. Conﬁguration Driven

3. Target Grammar Driven

Source Grammar Driven Subtransformation

The source grammar driven (SGD) transformation

consists of the following four processes:

1. Lexical Analysis: The lexical analysis is done us-

ing a scanner component. The scanner is generated

on the basis of a lexical analysis speciﬁcation of

the source data and produces a sequence of tokens.

A token is a syntactically structures symbol, whose

structure is described in the lexical analysis speci-

ﬁcation.

2. Syntactic Analysis: The sequence of tokens pro-

duced by the scanner is forwarded to the parser,

which veriﬁes the structure of the source data

against the source data grammar. We use an at-

tributed grammar, which can be seen as dynamic

description of a transformation process, i.e. a

syntax-driven algorithm.

3. Semantic Analysis: The semantic analysis checks

local and global context conditions, during the syn-

tactic analysis phase.

4. Transformation: The transformation converts the

data that has passed the syntactic and semantic

analysis into an internal, intermediate format.

We use CoCo/R (M”ossenbeck, 1990) as transforma-

tion tool. The lexical analysis speciﬁcation is de-

scribed by regular expressions. The attributed gram-

mar of the source data is deﬁned in EBNF.

Attributed grammars were introduced by Knuth

in (Knuth, 1968) to formalize the semantics of

context-free languages. They describe in their origi-

nal form dependencies between attributes of symbols,

originating from the lexical analyzer. However, at-

tributed grammars can be seen as a dynamic descrip-

tion of a process, i.e. as a syntax directed algorithm.

The structure of the source data determines the order

of the global semantic analysis and the local transfor-

mations.

Conﬁguration Driven Subtransformation A cru-

cial part of the transformation system is the conﬁg-

uration driven subtransformation (CD). The transfor-

mation system contains a set of CD types with asso-

ciated conﬁgurations for different target data formats.

The CD transformation is an intermediate transforma-

tion that functions as a bridge. It decouples the source

data grammar from the target data grammar so that the

two can vary independently. This avoids a binding be-

tween the associated transformations and allows ﬂex-

ible adaptation in case of a modiﬁcation or extension

of the source or the target data grammar.

Target Grammar Driven Subtransformation The

target grammar driven (TGD) transformation is gen-

erated from the target data grammar. It takes the data

from the CD transformation and generates data in the

target grammar format.

Since the transformation is produced from the tar-

get grammar, the transformation system guarantees

that the data results in the speciﬁed format.

TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY SYSTEM INTEGRATION

205

Legacy data

Transformation Transformation Database

XML

Transformation

Figure 3: Transformation Sequence for Legacy System Integration.

6.1 Legacy System Integration

In the application scenario we take the data from

the legacy system and import the data into a central

database.

The transformation system applies a sequence con-

sisting of two transformations. The ﬁrst transforma-

tion converts the legacy data into XML format while

verifying the data during the SGD subtransformation.

The second transformation parses and processes the

data and imports them into the database.

6.1.1 Token-XPath Matrix

The ﬁrst transformation converts the legacy system’s

proprietary data format into an intermediate format.

In the SGD subtransformation, the scanner and

parser are generated and perform a syntax check on

the source data. The semantic veriﬁcation (in our ap-

plication scenario suppression of duplicate data en-

tries in the source data) is manually programmed and

integrated into the generated parser.

The CD subtransformation determines where data,

originating from a token produced by the scanner

component and semantically checked and converted

during the semantical analysis, is inserted into the

resulting XML document, serving as an intermedi-

ate data format in the transformation sequence. This

is performed applying a Token-XPath-Assigment ma-

trix (TXPA matrix) M

T X

= T × X, which consists of

the tokens symbols T of the source data grammar and

the target data grammar XML elements, expressed as

XPath elements X.

The target grammar is presented as a XML

Schema. The target grammar driven subtransforma-

tion is generated using JAXB (SUN Microsystems,

2003), which generates a suite of hierarchical classes

that produces an XML document complying with the

XML Schema. This suite of classes is subsequently

used by the CD and the TGD transformation. They

represent an interface that both transformations apply

in cooperation using introspection.

The intermediate (CD) subtransformation decou-

ples the source and the target grammar driven sub-

transformation (Figure 4). If the source or the tar-

get grammar is modiﬁed or the semantic analysis

changes, only the TXPA matrix needs to be adapted.

This makes the transformation system ﬂexible and ro-

bust in the case of changes.

6.1.2 XPath-Database Conﬁguration

The second transformation imports the data from the

XML document into a database. Most databases al-

low importing XML data, or comma-separated vaue

lists. However, data can only be inserted into a single

table, and most often this data requires further pro-

cessing such as splitting the data and distributing the

data among several database tables.

The SGD transformation is accomplished employ-

ing an XML parser. The CD and the TGD transforma-

tions use OJB (The Apache DB Project, 2003). OJB

generates a set of classes on the basis of a database de-

sign allowing transparent persistent mapping of Ob-

jects against relational databases. It allows storing

objects, or part of an object in relational databases,

and reading data from a relational database into the

generated object structure.

The grammar oriented transformation needs to re-

work the data from an XML into a OJB object repre-

sentation. The OJB object structure is then imported

into the database (Figure 5).

The objective of the CD transformation is to remain

independent from the grammar of the source XML

document and the target conﬁguration of the database.

We need to take into account the following require-

ments:

1. Speciﬁcation of a mapping between XML elements

and OJB objects.

2. Instantiation of OJB objects creating a new dataset.

3. Relations between the OJB objects.

4. Processing of duplicate datasets. Duplicates are al-

ready ﬁltered out in the ﬁrst transformation. How-

ever, at this stage we cannot detect duplicates,

which might occur during the reordering of the data

in the second transformation, nor can we detect du-

plicates that are already in the database.

5. Declaration of an import sequence to prevent pri-

mary key violation.

We have developed XML2OJB, a mapping from

XML documents to OJB object structure (Ap-

pendix A). It allows ﬂexible, adaptable, and inde-

pendent import of arbitrary structured XML data into

arbitrary database table conﬁguration.

Appendix A shows part of an example where an

XML address list is inserted into a database. The

XML2OJB conﬁguration is divided into ﬁve parts.

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

206

Source Data

SGD

Transformation

CD

Transformation

TGD

Transformation

XML

Source Data

Grammar

TXPA

Configuration

XML Schema

Figure 4: Transformation from legacy data to XML.

XML

SGD

Transformation

CD

Transformation

TGD

Transformation

XML Schema

XML2OJB

Configuration

OJB

Database

Figure 5: Import from XML into a database.

• ClassDeﬁnition: The ClassDeﬁntion section de-

ﬁnes the objects that are imported into the database.

These objects are instantiated and the attributes are

set using the methods speciﬁed in the SetMethod

element. Aliases are declared, which are later used

in the SourceDocument section.

• Assembly: The Assembly section deﬁnes the be-

havior when importing a new data record. The Re-

peat element speciﬁes the start of a new data record

in the XML document. The Insert element speci-

ﬁes where the data is set in the OJB objects, and the

CreateObject element deﬁnes the objects that are

required to be instantiated.

• DuplicateRecord: The Duplicate Record section

speciﬁes the element that functions as autokey. The

speciﬁcation of an autokey is necessary to avoid du-

plicate entries in database tables.

• ImportSequence: The ImportSequence section de-

termines the sequence in which the objects import

their data into the database.

• SourceDocument: The SourceDocument element

declares where to ﬁnd the necessary information in

the XML source document.

The TGD transformation consists of importing the set

of OJB classes into the database. The process is con-

ﬁgured using a speciﬁc OJB conﬁguration ﬁle.

7 CONCLUSION

We have presented a transformation system that man-

ages sequences of transformation. Each transforma-

tion of a sequence consists of three subtransforma-

tions and is grammar driven. The source grammar

driven subtransformation converts data into an inter-

mediate format. The inner subtransformation is a

bridge between the data represented in source and tar-

get format. The target grammar driven transformation

converts the data from the intermediate format into

the resulting target format. The introduction of an in-

termediate transformation allows the source and the

target grammar to vary independently.

Currently, there are two transformation conﬁgura-

tions. The TXPA matrix maps a sequence of tokens

onto XML elements. The XML2OJB conﬁguration

maps XML elements to OJB objects, which can be

imported into a relational database. The XML2OJB

transformation proved to be successful due to its ﬂex-

ibility. The architecture of the transformation system

represents a viable solution to systems that require

frequent reconﬁguration and maintenance.

Future work will focus on extending the set of pre-

deﬁned transformations. We will continue working

on fault tolerance and error recovery within a single

transformation.

REFERENCES

Brodie, M. and Stonebraker, M. (1995). Migrating Legacy

Systems Gateways, Interfaces and the Incremental Ap-

TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY SYSTEM INTEGRATION

207

proach. Morgan Kaufman.

DataMirror (2001). Managing your data the XML way:

Data transformation, exchange and integration.

Fiore, P. (1998). Data Warehousing. Evolving Enterprise,

1(1).

Hieber, R. (2002). Supply Chain Management. A Col-

laborative Performance Measurement Approach. vdf

Hochschulverlag, Z

urich, Switzerland.

IBM (2003). IBM Legacy Transformation Services. Tech-

nical report, IBM.

Knuth, D. (1968). Mathematical System Theory 2, chapter

Semantics of Context-Free Languages, pages 127 –

145. D.E. Knuth.

M”ossenbeck, H. (1990). A Generator for Fast Compiler

Front-Ends. Technical Report Report 127, Institut f

Computersysteme, ETH Z

urich.

Partsch, H. and Steinbruggen, R. (1983). Program Trans-

formation Systems. ACM Computing Surveys, 15(3).

Richardson, R., Lawless, D., Bisbal, J., Wu, B., Grimnson,

J., and Wade, V. (1997). A Survey of Research into

Legacy System Migration. Technical Report TCD-

CS-1997-01, Computer Science Department, Trinity

College Dublin.

St”or, M., Birkeland, N., Nienhaus, J., and Menkhaus, G.

(2003). IT Infrastructure for Supply Chain Manage-

ment in Company Networks with Small and Medium-

sized Enterprises. In Proceedings of the 5th Interna-

tional Conference of Enterprise Information Systems,

volume 4, pages 280 – 287, Angers, France.

SUN Microsystems (2003). Java Architecture for XML

Binding (JAXB).

The Apache DB Project (2003). Object/Relational Bridge

(OJB).

Winter, V. L. (1994). Proving the Correctness of Program

Transformations. PhD thesis, University of New Mex-

ico.

Winter, V. L. (1999). Program Transformations in HATS. In

Proceedings of the Software Transformation Systems

Workshop, California, USA.

Wu, B., Lawless, D., Bisbal, J., Grimson, J., Wade, V.,

O’Sullivan, D., and Richardson, R. (1997). Legacy

System Migration: A Legacy Data Migration Engine.

In Experts, C. C., editor, 17th International Database

Conference, pages 129 – 138, Brno, Czech Republic.

ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION

208

A XML2OJB Mapping

<SetMethod Alias="lastName">setLastName</SetMethod>

<SetMethod Alias="addressPlace">setAdressPlace</SetMethod>

</Class>

<SetMethod Alias="place">setPlace</SetMethod>

<SetMethod Alias="postalCode">setPostalCode</SetMethod>

</Class>

</ClassDefinition>

<AutoKey>email</AutoKey>

</DuplicateRecord>

<CreateObject>addressDB.Adress</CreateObject>

<CreateObject>addressDB.place</CreateObject>

</Repeat>

</Assemblies>

<Class>addressDB.Adress</Class>

<Class>addressDB.Place</Class>

</ImportSequence>

</SourceDocument>

</XML2OJB>

Figure 6: Conﬁguration for mapping a XML document onto an OJB object structure.

TRANSFORMATION-ORIENTED MIDDLEWARE FOR LEGACY SYSTEM INTEGRATION

209