school claims that users have difficulties expressing
informational needs; (c) the meta-data centric school
advocates the use of meta-data for large sets of results.
In this paper, we describe a method for support-
ing users in querying by providing meta-data about
attributes of integrated GVV classes. Our approach
aims at showing to the user semantic, synthesized
and meaningful information emerged directly from
the data. We claim such meta-data are necessary
for querying classes result of an integration process:
the end user typically does not know the contents
of the GVV classes, he simply defines his queries
on the basis of the names of classes and attributes.
Such labels may be generic: the synthesis opera-
tion narrows in few classes data ”semantically sim-
ilar” coming from different sources. Consequently
the name/description for a global class is often un-
specific, especially for web sources where the user is
highly involved in choosing the label for the elements
descriptions. For example the integration of two lo-
cal classes “T-Shirt” and “Trouser” could be
a unique Global Class called “Dress”. Such name
does not allow a user to know which specific kinds of
dresses are stored.
We proposed a partial solution to these issues
in (Beneventano et al., 2003) where a semantic an-
notation of all the Global Classes with respect to the
WordNet lexical database
1
provides each term with a
well- understood meaning.
Our goal is now enriching the description of se-
lected attributes specifying as meta-data a list of the
“relevant values” for such attributes. Furthermore rel-
evant values may be hierarchically collected in a tax-
onomy. In this way, the user may exploit new meta-
data in the interactive process of creating/refining a
query. The same meta-data are also exploited by the
system in the query rewriting process in order to filter
the results showed to the user.
Exploiting such new kind of meta-data is an inter-
esting challenge: the literature about integration sys-
tems mainly focuses on creating/representing struc-
tures for heterogeneous data sources (Buneman et al.,
1997; Nestorov et al., 1997; Halevy, 2004). Only
recently, some techniques for combining data struc-
ture and data management were developed (Chaud-
huri et al., 2005). The work closest to our is the “Mal-
leable Schema” (Dong and Halevy, 2005), where a
middle point between a collection of schemas/DTDs
in a domain and a single strict schema for that do-
main is offered. In contrast with malleable schemas,
our approach models a domain with a fixed semistruc-
tured model (ODM
I
3
) where meta-data derived from
extensional analysis are added.
Next section describes the MOMIS approach to
data integration, section 3 defines our technique to
1
http://wordnet.princeton.edu/
calculate relevant values for selected attributes, sec-
tion 3.2 shows the impact of relevant values in the
querying process and section 4 gives an example of
relevant values calculated for a real domain. Finally
section 5 sketches out some conclusions.
2 THE MOMIS APPROACH
The framework consists of a language and several
semi-automatic tools:
• The ODL
I
3
language is an object-oriented lan-
guage, with an underlying Description Logic; it is
derived from the standard ODMG. ODL
I
3
extends
ODL with the following relationships expressing
intra- and inter-schema knowledge for the source
schemas: SYN (synonym of), BT (broader terms),
NT (narrower terms) and RT (related terms). By
means of ODL
I
3
only one language is exploited
to describe both the sources (the input of the syn-
thesis process) and the GVV (the result of the
process). The translation of ODL
I
3
descriptions
into one of the Semantic Web standards such
as RDF, DAML+OIL, OWL is a straightforward
process. In fact, from a general perspective an
ODL
I
3
concept corresponds to a Class of the Se-
mantic Web standard, and ODL
I
3
relationships are
translated into properties.
• Information integration is performed in a semi-
automatic way, by exploiting the knowledge in
a Common Thesaurus (semi-automatically defined
from the structural and lexical analysis of the infor-
mation sources) and ODL
I
3
descriptions of source
schemas with a combination of clustering tech-
niques and Description Logics. This integration
process (performed by means of the MOMIS - On-
tology Builder) gives rise to a GVV of the un-
derlying sources. The GVV consists of a set of
Global Classes, each of them made up of Global
Attributes. Mapping rules connect the GVV with
the original information sources and integrity con-
straints are specified to handle heterogeneity.
• The MOMIS Query Manager is the coordinated set
of functions which take an incoming query, decom-
pose the query according to the mapping of the
GVV onto the local data sources relevant for the
query, send the sub-queries to these data sources,
collect their answers, perform any residual filter-
ing as necessary, and finally deliver the answer
to the requesting user. The unfolding and rewrit-
ing process is based on the full disjunction oper-
ation (Galindo-Legaria, 1994) and it is described
with details in (Beneventano and Lenzerini, 2005).
INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES
47