elements of the system.
Matching operators are very susceptible to the
data in input, because different operators are tailored
to different data, and no generic matching function
can be designed. For this reason the only way for
implementing a generic Data Integration algorithm is
to support different matching operators. Moreover,
combining them in a single algorithm is a non triv-
ial process that have to take into account a variety of
options.
This paper deals with the problem of managing
a pallet of matching operator supporting different se-
mantics. The approach chosen is to combine all the
available association produced by different operators
in a cluster. This cluster collect all the elements that
can be associated and express the semantics of the as-
sociations. This way in the cluster we have all the
information need in order to create a mapping. Also
Mapping Generation is activated only on those set of
elements that can be queried without violating any in-
tegrity constraints. Our system is named Ontology
Driven Data Integration (ODDI), it is based on For-
mal Concept Analysis (FCA)(Ganter et al., 2005) as
searching space in order to discover concept-level re-
lations for Mapping Generation, and uses an ontology
as data access layer. Using an ontology as common
conceptualization brings several benefits but the more
relevant is that due to the sound logic basis it is possi-
ble to perform reasoning task on the knowledge base
such as Consistency Checking and Classification (Cui
et al., 2007).
The paper is structuerd as follows: Section 2 in-
troduces the formally a generic Data Integration Sys-
tem, focusing then to our definition of mapping; then
in Section 3 the matching process is described, pro-
viding initially our formalization and then a catego-
rization of the traditional matching operators. Section
4 describes the mapping generation module, focusing
on the use of FCA as a formalism for representing
the information. The paper is enriched with an exam-
ple of the generation of the FCA lattice starting from
a local schema S and a global schema G. Section 5
concludes the paper, outlining conclusions.
2 DATA INTEGRATION
The system we propose in this paper is based on
Global as View approach (Calvanese et al., 1998), be-
cause the G is given trough an ontology and the map-
ping are constructed by associating to the concepts of
G the set of attributes in L that carry the same infor-
mative value of the attributes of these concepts. For-
mally we can define a data integration system as triple
I =< G, L,M >; where G is the global representation,
L the local set of local representations composed by n
single representations s
1
,s
2
,...,s
n
and M is the map-
ping between L and G. The mapping M is the re-
sult of a complex process taking as input M
t
, a set
of matching relations among the simple elements of
G and L and generating the mapping M defined as
M =< M
p
,M
o
>; where M
p
is a mapping between
objects of the local representation L and the global
representation G (such as for instance concepts in an
ontology or table in a database) and M
o
is a mapping
between elements of L: relations between objects of
the same source schema s
a
in L, such as the typical
primary-key→foreign-key, but also relations between
elements of different source schemas s
i
, s
j
of L that
are semantically related.
In general two data set can be integrated only if
they describe a common set of real world facts. Of
course this common set does not have to cover the
totality of the described facts. In (Parent and Spac-
capietra, 1998), relations between facts described by
different data sets are defined by set relationships. Ac-
tually this approach is partially inappropriate because
the instances of two data sets can describe the same
facts at different detail levels or they can describe dis-
tinct facts to be related in G.
According to our work a mapping between data sets
can be oriented to two distinct goals:
• Composition. In this case some redundant infor-
mation is assumed to be stored in the data sets.
The mapping acts on this redundant information
in order to aggregate new compositions of data
items. In this perspective G contains views that
recompose the data items contained in L in a new
structure.
• Summarization. In this case the information
stored in the data sets can be reduced to a com-
mon type. The mapping expresses the commu-
nality shared by different data items. In this per-
spective G contains views that summarize the data
items contained in L in a more compact represen-
tation.
In principle a mapping can cover both these goals. If
an human agent generate the mapping, she will nat-
urally distinguish between the two cases. But if the
mapping is generated by an algorithm, achieving the
right goal mainly depends on the operator adopted for
matching the data items.
The system that we propose consists of two mod-
ules: the first that generates M
t
, given G and L. The
second module generates M
p
and M
o
, by representing
M
t
as an FCA used as searching space to find seman-
tic relations between elements of G, and L.
ICEIS 2008 - International Conference on Enterprise Information Systems
16