breviated form which makes it hard to automati-
cally identify that it denotes a c
ommand line.
Taken alone, a specialized learner can prove inef-
ficient or inadequate. For example, a learner special-
ized in the mapping of field names does not prove par-
ticularly outstanding when it comes to match names
which are not well-known synonyms (e.g. "‘com-
ment"’ and "‘outline"’), or names which abbrevi-
ate a concept (e.g. the name "‘home"’ used instead
of "‘telephone at home"’) or names whose broad
meaning would allow them to be matched with al-
most everything (e.g. "‘thing"’ or "‘entity"’). Sim-
ilarly, a content learner, that bases its semantic de-
duction on the frequency of lexical units appearing
in fields, would prove quite inadequate to analyze
numerical fields! Moreover, a learner relying on a
naive bayesian approach (Berlin and Motro, 2002; Pe-
dro Domingos, 1997; Kohavi, 1996) would not be
profitable for analyzing fields accepting values of nu-
merical or enumerated types (e.g. "‘gender"’).
Hence, this article proposes to broaden and diver-
sify both data sets and training sets by extending them
with documents coming from, what we call, the in-
formational context of data sources. Indeed, isolat-
ing a data source from its context (as it is the case
when solely considering its XML schema) reduces
beforehand the set of usable cognitive information
which underlies the conceptualization of the repre-
sented data. In practice, the context within which the
data source lies, constitutes a precious fount of infor-
mation calling for a more systematic exploration so as
to better define the semantics of concepts.
In the sequel, the notion of informational context is
defined and the architecture of a context analyzer is
presented.
2 INFORMATIONAL CONTEXT
The informational context of a data source is com-
posed of all the information, saved in electronic for-
mat, which belongs to the data source’s environment
and shares the same domain.
1. The descriptive context of a data source gathers all
the specification files describing the data or their
application environment. These files document the
data according to various abstraction levels. For
example, if the data source is a database the de-
scriptive context could be composed of the follow-
ing documents:
• A requirements document describing data and
services which the user calls for in applications
using the database. A test plan for instance,
might be practical to establish the link between
input and output data what could hide relevant
complex concepts.
• Analysis and design specifications including the
various formal and semi-formal models elabo-
rated for applications relying on the database.
Data dictionnaries are worth citing under this
category. It describes in a formal fashion, among
others, data flows, data structures et data de-
posits. As an example, consider a structure de-
scription of the concept "Order", using regular
expressions:
Order = O_Header + O_Item
∗
+ O_F ooter
O_Header = O_Number + Date + CustAdress
O_Item = ItemN um + Descr + Qty + P rice
O_F ooter = T axAmount + T otalAmount
This provides relevant information about compo-
sitions and dependencies of "Order" and "Items"
concepts. Furthermore, detailed description of
each data element can be obtained from a data
description deposit.
• User manuals. In the same way as for dictio-
naries, one can think of the numerous formula
linking concepts present in a such resource.
2. The operational context of a data source is com-
posed of all the data management and processing
files. Among others, these files can be
• programs written in any known programming
paradigm and language. The way concepts
are manipulated could hide valuable information
about how they are linked to each other.
• Files containing SQL-type requests.
For each data source, the important is to list all the
documents which may compose the descriptive and
operational contexts of this data set. The analysis of
these documents (in addition to the analysis of the
data themselves and their schema definition) will help
enhancing the knowledge required by the learners to
deduce the best semantic mapping between the given
data sources.
3 CONTEXT ANALYSIS
The main objective of context analysis consists in ex-
panding data sources with semantic information and
hints drawn from the context. Among other, this in-
formation is intented to be used by learners during
their training stage to increase the precision of the
mapping they are asked to compute.
In particular, context analysis offers an interesting
opportunity for resolving complex mappings which
are pairing combinations of concepts (e.g.(street,
zip code, city) → employee_address).
To the best of our knowledge, there is still no
satisfactory solution addressing this problem al-
though it is frequently encountered. For example,
ICEIS 2005 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
446