on precisely engineered mappings. A data element
will be extracted from a source schema, will get
transformed and finally will be loaded into a
destination schema. Mappings from the source to the
destination schema must be completely defined
before data integration can take place.
Dataspace management systems try to perform
data integration processes the other way round.
Inspired by desktop search engines, a basic keyword
query utility is available and can be used over all
data sources to be integrated. Initially, the quality of
integration is limited to providing a common
keyword search interface. Like developing
integration mappings, no prior investment in the
integration process is needed. The result set will
contain duplicates, different data encodings, etc. As
the data integration requirements get more
demanding, the dataspace management system must
be enriched with more relationships and mappings.
Using this additional information the integration
process should be able to produce a better level of
integration. This process is called pay-as-you-go
(Salles, Dittrich, Karakashian, Girard and Blunschi,
2007) (Franklin, Halevy and Maier, 2005).
A system that allows for gradual enhancement of
relationships is proposed in Talukdar et al. (2008).
Starting with a basic keyword search, the system
matches the keywords to relational data sources and
evaluates available associations to finally generate a
list of possible queries ranked by a score. Next, the
user has to provide feedback to the system regarding
which query fits the intended information request
best. Based on this feedback, the system is able to
learn by assigning new weightings to associations,
which in turn changes the future rank of proposed
queries.
In this work, the idea of gradually enhancing the
level of integration is applied when mapping the
controlled vocabulary to the logical data models of
the systems in an iterative way. Initially, only a basic
keyword search will be available over the controlled
vocabulary and the logical data models of the
systems. As more and more mappings are
introduced, the more the semantic context of the
system will be formed, which in turn closes the
semantic gap across the systems.
A similar approach has been proposed in
Karjikar, Roy and Padmanabhuni (2009), but for
another area of application. It used Topic Maps to
represent the knowledge contained in Universal
Business Language (UBL) documents, an OASIS
standard for generic business documents.
3 CONCEPTUAL MODELING
USING TOPIC MAPS
Conceptual models provide a controlled vocabulary
and describe entities and relationships involved in
these applications. Several languages exist for
conceptual modeling. The most prominent
representative is UML, the Unified Modeling
Language (Object Management Group, 2007).
To close the semantic gap across systems, a
semantic-aware conceptual modeling language is
needed. Ontology oriented languages like Topic
Maps and OWL, the Web Ontology Language,
provide built-in support for semantic descriptions
and thus seem to be most suitable. Topic Maps have
been selected as the conceptual modeling language
for this study because of its simplicity and
practicality for end users. Topic Maps (ISO/IEC,
2002) are an ISO standard for knowledge
representation. They are inspired by semantic
networks and index structures. “Ontopia”, a general
purpose open source Topic Maps development
environment has been chosen for prototype
development (Ontopia, 2010).
In the following, only the most important
concepts of Topic Maps can be introduced. A
detailed introduction is provided in Pepper (2010).
The main model elements of Topic Maps are topics,
associations and occurrences. Every topic has a type.
Types can be inherited (i.e., every type can define
parent types and sub types). Associations express
relationships between one or more topics. The
following example establishes an association “is-
manager-of” between the topics “Christian” and
“Michael”. Both topics are of type “Employee”.
Christian:Employee is-manager-of
Michael:Employee.
Occurrences are references to additional
information that is relevant for the topic. This could
be a reference to a web-site or simply a data element
that contains additional information, such as the
employee’s birth date.
A prototype of the proposed approach has been
implemented for voestalpine Stahl GmbH.
Voestalpine operates a lot of autonomous
information systems, each designed to perfectly
perform a certain task in the production process. The
data processed in these systems must be integrated
into a global view to optimally control the
production process. As already mentioned, at a
specific size, this distributed architecture gets very
uncomfortable. Even experts find it difficult to give
answers to simple questions like: “Tell me all
systems where some data element gets processed.”
CONCEPTUAL MODELS FOR METADATA INTEGRATION AND ARCHITECTURE EVOLUTION
271