INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA

FROM WEB-SITES

Domenico Beneventano

, Sonia Bergamaschi

, Stefania Bruschi

Francesco Guerra

, Mirko Orsini

, Maurizio Vincini

DII - Universit

a di Modena e Reggio Emilia

via Vignolese 905, 41100 Modena

DEA - Universit

a di Modena e Reggio Emilia

v.le Berengario 51, 41100 Modena

Keywords:

Semantic integration, wrapper HTML, query manager.

Abstract:

Research on data integration has provided a set of rich and well understood schema mediation languages and

systems that provide a meta-data representation of the modeled real world, while, in general, they do not deal

with data instances.

Such meta-data are necessary for querying classes result of an integration process: the end user typically does

not know the contents of such classes, he simply deﬁnes his queries on the basis of the names of classes and

attributes.

In this paper we introduce an approach enriching the description of selected attributes specifying as meta-data

a list of the “relevant values” for such attributes. Furthermore relevant values may be hierarchically collected

in a taxonomy. In this way, the user may exploit new meta-data in the interactive process of creating/reﬁning a

query. The same meta-data are also exploited by the system in the query rewriting/unfolding process in order

to ﬁlter the results showed to the user.

We conducted an evaluation of the strategy in an e-business context within the EU-IST SEWASIE project. The

evaluation proved the practicability of the approach for large value instances.

1 INTRODUCTION

Integration of data from multiple sources is one of the

main problems facing the database research commu-

nity. One of the most common approach for integrat-

ing information sources is to build a mediated schema

as synthesis of them. By holding all the data collected

in a common way, such mediated schema allows the

user to pose a query following a global perception.

The system answers translating the query into a set

of sub-queries for the involved sources by means of

automatic unfolding-rewriting operations taking into

account the mediated and the sources schemas. Re-

sults from sub-queries are then uniﬁed by exploiting

data reconciliation techniques.

Research on data integration has provided a set of

rich and well understood schema mediation languages

and systems, which may be classiﬁed as the global-as-

view (where the mediated schema is deﬁned as a set

of views over the data sources) and the local-as-view

(where the contents of data sources are described as

view over the mediated schema) formalisms (Halevy,

2003).

Following the GAV approach, we developed

MOMIS (Mediator envirOnment for Multiple Infor-

mation Sources) (Bergamaschi et al., 2001; Beneven-

tano et al., 2003), a framework to perform informa-

tion extraction and integration from both structured

and semi-structured data sources, plus a query man-

agement environment to take incoming queries and

process them through the exploitation of the mediated

schema.

Two kinds of users interact with MOMIS: the inte-

gration engineer and the end user (or client/web ser-

vice applications). The integration engineer is re-

sponsible for the integration process (the operation

is performed by means of the MOMIS - Ontology

Builder) giving rise to a Global Virtual View (GVV)

of selected information sources; the end user queries

the GVV classes (created by the integration engineer)

aiming at obtaining a uniﬁed answer from the in-

volved sources.

Several approaches emerged about the user sup-

porting in querying. Such approaches may be summa-

rized in three different schools (Broder et al., 2005):

(a) the search-centric schools that guided naviga-

tion is superﬂuous since users can satisfy all their

needs via simple queries; (b) the taxonomy navigation

Beneventano D., Bergamaschi S., Bruschi S., Guerra F., Orsini M. and Vincini M. (2006).

INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES.

In Proceedings of WEBIST 2006 - Second International Conference on Web Information Systems and Technologies - Internet Technology / Web

Interface and Applications, pages 46-53

DOI: 10.5220/0001247900460053

 SciTePress

school claims that users have difﬁculties expressing

informational needs; (c) the meta-data centric school

advocates the use of meta-data for large sets of results.

In this paper, we describe a method for support-

ing users in querying by providing meta-data about

attributes of integrated GVV classes. Our approach

aims at showing to the user semantic, synthesized

and meaningful information emerged directly from

the data. We claim such meta-data are necessary

for querying classes result of an integration process:

the end user typically does not know the contents

of the GVV classes, he simply deﬁnes his queries

on the basis of the names of classes and attributes.

Such labels may be generic: the synthesis opera-

tion narrows in few classes data ”semantically sim-

ilar” coming from different sources. Consequently

the name/description for a global class is often un-

speciﬁc, especially for web sources where the user is

highly involved in choosing the label for the elements

descriptions. For example the integration of two lo-

cal classes “T-Shirt” and “Trouser” could be

a unique Global Class called “Dress”. Such name

does not allow a user to know which speciﬁc kinds of

dresses are stored.

We proposed a partial solution to these issues

in (Beneventano et al., 2003) where a semantic an-

notation of all the Global Classes with respect to the

WordNet lexical database

provides each term with a

well- understood meaning.

Our goal is now enriching the description of se-

lected attributes specifying as meta-data a list of the

“relevant values” for such attributes. Furthermore rel-

evant values may be hierarchically collected in a tax-

onomy. In this way, the user may exploit new meta-

data in the interactive process of creating/reﬁning a

query. The same meta-data are also exploited by the

system in the query rewriting process in order to ﬁlter

the results showed to the user.

Exploiting such new kind of meta-data is an inter-

esting challenge: the literature about integration sys-

tems mainly focuses on creating/representing struc-

tures for heterogeneous data sources (Buneman et al.,

1997; Nestorov et al., 1997; Halevy, 2004). Only

recently, some techniques for combining data struc-

ture and data management were developed (Chaud-

huri et al., 2005). The work closest to our is the “Mal-

leable Schema” (Dong and Halevy, 2005), where a

middle point between a collection of schemas/DTDs

in a domain and a single strict schema for that do-

main is offered. In contrast with malleable schemas,

our approach models a domain with a ﬁxed semistruc-

tured model (ODM

) where meta-data derived from

extensional analysis are added.

Next section describes the MOMIS approach to

data integration, section 3 deﬁnes our technique to

http://wordnet.princeton.edu/

calculate relevant values for selected attributes, sec-

tion 3.2 shows the impact of relevant values in the

querying process and section 4 gives an example of

relevant values calculated for a real domain. Finally

section 5 sketches out some conclusions.

2 THE MOMIS APPROACH

The framework consists of a language and several

semi-automatic tools:

• The ODL

language is an object-oriented lan-

guage, with an underlying Description Logic; it is

derived from the standard ODMG. ODL

extends

ODL with the following relationships expressing

intra- and inter-schema knowledge for the source

schemas: SYN (synonym of), BT (broader terms),

NT (narrower terms) and RT (related terms). By

means of ODL

only one language is exploited

to describe both the sources (the input of the syn-

thesis process) and the GVV (the result of the

process). The translation of ODL

descriptions

into one of the Semantic Web standards such

as RDF, DAML+OIL, OWL is a straightforward

process. In fact, from a general perspective an

ODL

concept corresponds to a Class of the Se-

mantic Web standard, and ODL

relationships are

translated into properties.

• Information integration is performed in a semi-

automatic way, by exploiting the knowledge in

a Common Thesaurus (semi-automatically deﬁned

from the structural and lexical analysis of the infor-

mation sources) and ODL

descriptions of source

schemas with a combination of clustering tech-

niques and Description Logics. This integration

process (performed by means of the MOMIS - On-

tology Builder) gives rise to a GVV of the un-

derlying sources. The GVV consists of a set of

Global Classes, each of them made up of Global

Attributes. Mapping rules connect the GVV with

the original information sources and integrity con-

straints are speciﬁed to handle heterogeneity.

• The MOMIS Query Manager is the coordinated set

of functions which take an incoming query, decom-

pose the query according to the mapping of the

GVV onto the local data sources relevant for the

query, send the sub-queries to these data sources,

collect their answers, perform any residual ﬁlter-

ing as necessary, and ﬁnally deliver the answer

to the requesting user. The unfolding and rewrit-

ing process is based on the full disjunction oper-

ation (Galindo-Legaria, 1994) and it is described

with details in (Beneventano and Lenzerini, 2005).

INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES

2.1 The MOMIS Ontology Builder

The MOMIS integration process, shown in Figure 1,

has ﬁve phases:

Figure 1: Functional representantion of the MOMIS Ontol-

ogy builder.

1. Local source schemata extraction. Wrappers

analyze sources in order to extract (or generate

if the source is not structured) schemas. Such

schemas are then translated into the common lan-

guage ODL

2. Local source annotation with WordNet. The in-

tegration designer deﬁnes a meaning for each el-

ement of a local source schema, according to the

WordNet lexical ontology. A tool supports the in-

tegration designer: some WordNet synsets are sug-

gested for each source element.

3. Common thesaurus generation. Starting from

the annotated local schema, MOMIS constructs

a set of relationships describing inter- and in-

traschema knowledge about classes and attributes

of the source schemata.

4. GVV generation. The MOMIS methodology,

applied to the common thesaurus and the local

schemata descriptions, generates a global schema

and sets of mappings with local schemata. The

Global Schema is made up of a set of global

classes. Several global attributed belong to a global

class.

5. GVV annotation. Exploiting the annotated lo-

cal schemata and the mappings between local

and global schemata, the MOMIS system semi-

automatically assigns name and meaning to each

element of the global schema.

The Ontology Builder Tool supports the integration

designer in all the GVV generation process phases.

2.2 Local Source Schemata

Extraction

To enable MOMIS to manage web pages and data

sources, we need specialized software (wrappers) for

the construction of a semantically rich representations

of the information sources by means of a common

data model. A wrapper logically converts the source

data structure into an ODL

schema. The wrapper

architecture and interfaces are crucial, because wrap-

pers are the focal point for managing the diversity

of data sources. For conventional structured infor-

mation sources (e.g. relational databases), schema

description is always available and can be directly

translated. For web content, this is mainly available

in the form of HTML documents: such documents

do not separate data from presentation and are ill-

suited for being the target of database queries and

most other forms of automatic processing. This prob-

lem has been addressed by much work on so-called

Web wrappers, programs that extract the relevant in-

formation from HTML documents and translate it into

a more machine-friendly format such as XML. The

wrapping problem has been addressed by a substantial

amount of work: in our approach we used Lixto (Got-

tlob et al., 2004) to translate the website data in XML

format. The main feature of Lixto is its graphics in-

tuitive interface that interactively guides the wrapper

designer’s intervention. The lixto wrapper is coupled

with an XML wrapper, we developed, generating for

each XML representation of a web-site the related

XML Schema (an XSD ﬁle) and loads the XML data

into a relational database. In this way, we are able

to structure and query web-site sources. Our XML

wrapper is based on MS .net framework and auto-

matically updates the data extracted from the web by

means of script daemons into the relational database.

3 PROVIDING INFORMATION

ABOUT RELEVANT VALUES

OF ATTRIBUTES

A global class contains data collected from differ-

ent local sources by means of an integration process,

and consequently its name (and the associated anno-

tations) may not perfectly ﬁt in its contents. Thus,

a query written only on the basis of this information

name may be misleading. Moreover, ignoring the val-

ues assumed by a global attribute may generate mis-

taken queries: a user that does not know the granu-

larity of an attribute may write an exceedingly selec-

tive query, or a selection clause that does not really

produce any result because semantically improper for

that context. On the other hand, to know all the data

WEBIST 2006 - INTERNET TECHNOLOGY

collected from a global class is not possible for a user:

databases contain large amount of data which a user

can not deal with.

For these reasons, we present a technique to pro-

vide the end user with the knowledge of the “relevant

values” for global attributes selected by the integra-

tion designer in the GVV building phase.

Given an attribute At of a global class C,arele-

vant value RV for At is a pair RV = RV N, RV I

where RV N is the name of the relevant value and

RV I is a set of values assumed by At, i.e., RV I ⊆



(C).

A set of relevant values SRV is a set

SRV = {RV

,RV

, ..., RV

},n > 0, such

that



RV i∈SRV



The relevant values may be classiﬁed in a taxon-

omy by means of ISA relationships: RV

ISA RV

;

a taxonomy need to be consistent:ifRV

ISA RV

then RV I

⊆ RV I

A relevant value RV = RV N, RV I is represen-

tative of all the associated values RV I: at the level

of query management a condition At=RV N will be

transformed in the equivalent condition At IN RV I.

Such relevant values are calculated by means of

a semi-automatic process composed of the following

steps:

1. identiﬁcation of the relevant values: the set of rel-

evant values SRV is calculated by applying clus-

tering techniques to



There are different algorithms in literature to be ap-

plied for clustering values. For example in (Gib-

son et al., 2000) a proposal for clustering values

that cannot be naturally ordered by a metric (i.e.

categorical data) is described. Nevertheless, we

claim that one single algorithm may not work in

any domain. Therefore our goal is to develop and

propose to the user a pool of techniques for se-

lecting the most suitable method (or the combi-

nation of different methods) for the speciﬁc do-

main. In MOMIS, we developed a syntactic al-

gorithm particularly customized for dealing with

classiﬁcations of services and goods. It is a typical

topic of the e-commerce, where enterprises propose

their products by means of web-sites. In such sites,

products are often grouped by means of categories

called in different sites in different ways. Moreover

categories are typically collected in taxonomies

on the basis of speciﬁc criteria.We observed that

the names of these categories are often qualiﬁed

with multiple attributes in order to describe spe-

ciﬁc products. The proposed method exploits such

features by means of the “Contains” function that

shows if a single value for the selected attribute is

contained in another one. We choose this func-

tion because typically implemented in RDBMS

and anyhow easily developed. The list of rele-

vant values is obtained by a stemming process on

the



C elements and applying the “Contains”

function to the attribute values repeatedly until the

achievement of a ﬁxed point. In section 4 we show

an example of relevant values set obtained by ap-

plying the algorithm on four web-sites.

2. identiﬁcation of the name of a relevant value:

the name associated to the relevant value is typi-

cally the most general value among the collected

values, i.e. given RV = RV N, RV I, RV N is

the most general value of RV I. The name choice

may be result of the clustering algorithm applied

to identify the relevant values. Otherwise, the sys-

tem proposes a name that has to be conﬁrmed by

the user. The method proposed in MOMIS, based

on the “Contains” function, allows deﬁning for spe-

cial string domains a set of relevant values SRV =

{RV

,RV

, ..., RV

}, where in each RV I

there

is a “most general value” which can be used as

name of the relevant value RV

(see section 4).

3. hierarchy deﬁnition: Relevant attributes may be

exploited for summarizing categories and classiﬁ-

cations. In this context, data sources (e.g.: infor-

mation systems, web-sites) typically provide par-

tial/total hierarchies for supporting users in the

querying phase. Our goal is exploiting such orig-

inal hierarchies by applying to the SRV the hierar-

chical relationships being among the values RV I

belonging to the RV I (see section 4 for an exam-

ple). In addition, the MOMIS system provides a

graphical interface helping users in the manually

execution of this process.

3.1 Relevant Values Representation

The set of relevant values is represented according the

proposal of the Ontology Engineering and Patterns

Task Force in the Semantic Web Best Practices and

Deployment Working Group (N. Noy, 2005), where

ﬁve different approaches are suggested to represent

OWL classes as property values on the Semantic Web.

In particular, with reference to the ﬁrst representa-

tion , OWL classes are directly used to describe the

different relevant values belonging to the selected

global attribute A

. This assumption allows model-

ing an hierarchy of relevant values by means of the

rdfs:subClassOf property. According to this

approach, we represent each RV N as an OWL Class

and the set of values v

,j :1, ..., k of RV I as in-

stances of RV N ; ﬁnally each RV N (i.e. an OWL

class) is then generalized by a root OWL Class (called

as the A

name) that becomes the property value of

the A

attribute.

For example, referring to the example domain de-

scribed in Section 4 (Table 1), the Moulding relevant

value is modeled as follows:

INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES

<owl:Class rdf:ID="CategoryName">

<rdfs:subClassOf

rdf:resource=

"http://www.w3.org/2002/07/owl#Class"/>

</owl:Class>

<owl:Class rdf:ID="Moulding">

<rdfs:subClassOf

rdf:resource="#CategoryName"/>

</owl:Class>

<Moulding rdf:ID=

"Compression injection moulding" />

3.2 Querying Relevant Values

The MOMIS system provides the end user with

a graphical interface where the global classes, the

global attributes, the primary and foreign keys are

shown (see ﬁgure 2). This interface enables the end

user to write a query. The interface shows on the left

the complete GVV with an E/R like formalism and a

tree representation. By selecting a class, its WordNet

annotations (i.e. the synsets associated to the class)

are visualized on the bottom panel. Right on the top a

box allows inserting the query.

Figure 2: Screenshot of the MOMIS Query Manager.

On the basis of the knowledge about relevant val-

ues, the queries may exploit selection clauses ﬁtting

in the data:

1. the user is interested to the instances assuming

a speciﬁc known value: relevant values do not im-

prove the querying process, but, if an attribute value

is called in different sources in different ways, the

results may not include interesting instances.

2. the user queries an attribute with relevant val-

ues: the user expresses a query, by using the graph-

ical interface, containing a relevant value, i.e. a

condition of the form At = RVN. As stated be-

fore, this condition is equivalent to the condition

At IN RVI. The discussion about how this query

is executed is out of the scope of this paper; intu-

itively:

• In a naive approach the condition At=RVNis

rewritten into a local source L (for simplicity, we

suppose that the global attribute At corresponds

to the same local attribute At in the local sources

L)as



value∈RV I

At = value.

• On the other hand, if the function Contains is

executable/supported by the local source L (this

is a frequent case if the local source is a data-

base), the condition At = RVN may be rewrit-

ten into L as Contains(At,RVN) = true.

4 EVALUATION ON A REAL

DOMAIN

Within the EU SEWASIE project (IST-2001-34825),

we collected information about enterprises working

on the mechanical sector. Our goal was to integrate

information coming from specialized web sites.

4.1 Building the GVV and a

Relevant Attribute Set

Four portals containing data about italian companies

were analyzed:

• www.subforn.net: provides access to a data-

base containing more than 6.000 subcontractors of

eight italian regions. Companies are classiﬁed on

the basis of their production. Mechanichal and

mould sectors are divided into 53 different cate-

gories. For each category, several speciﬁc kinds of

production (almost 1000) are deﬁned.

• www.plasticaitalia.net: the plasticaitalia

database contains more than 12.000 italian compa-

nies. For each company, several kinds of produc-

tion are indicated: this classiﬁcation is based on a

three levels hierarchy specializing each kind of pro-

duction in more than 300 cases.

• www.tuttostampi.com: contains 4000 italian

companies categorized in 58 different kinds of ser-

vices.

• www.deformazione.it: more than 2000

companies are catalogued on the basis of 39 dif-

ferent sectors.

WEBIST 2006 - INTERNET TECHNOLOGY

By means of the MOMIS system, a GVV rep-

resentative of the four web sites was built. The

simple structure of the original information sources

generates a generic GVV composed of two main

global classes: one storing data about companies, the

second one containing all the production categories

(see ﬁgure 3 where a third table allows mapping

companies into categories).

Company(id, name, address, email, fax,

telephone number, country,

foundation year, ... )

Category(id, name)

List categories(company id, category id)

Figure 3: Some of the Global Classes for the mechanical

GVV.

According with this representation, it is very difﬁ-

cult for a user querying for companies producing on a

speciﬁc sector: the user does not have any idea about

the more than 1000 possible categories (the union of

all the different categories used in the four web-sites)

and indicating a speciﬁc category may be misleading:

similar categories may be called in different sources

in different ways (and then the user will have no result

or a partial result from his query) or the user may be

interested to a result with a larger granularity. For ex-

ample, if the user searches for companies producing

“mould”, the result will not include companies clas-

siﬁed as producing “Plastic castings”, that

could be interesting for the user. For these reasons,

we apply the “Contains” technique to the global at-

tribute name of the global class category. The re-

sult was a set of 355 relevant values. Figure 4 shows

the dimension of the obtained relevant values: 34%

of the relevant values (120) represents 80% (845) of

the instances of the selected attributes, while 235 rel-

evant values contain only one instance. Table 1 shows

some signiﬁcant relevant values and the instances be-

longing to them.

Figure 4: Attributes distribution in the relevant values set.

Table 1: An example of some relevant values.

Castings Castings, Casting zinc and its alloys,

Casting with pouring under pressure,

Cast iron casting, Steel casting, Casting

titanium and its alloys, Aluminum and

magnesium casting, Casting copper and

its alloys, Casting other metals, ...

Moulding Moulding, Compression injection

moulding, Insert moulding, Normal

moulding, Intrusion moulding, Dip

moulding, ...

Windings Windings,Filament winding reinforced

plastic, Transformer windings, Motor

windings, Coil windings

The high number of relevant attributes made up

of only one instance suggested to us to improve our

technique by considering some semantic knowledge

extracted from the original web-sites. E-commerce

web-sites classify their products on the basis of hi-

erarchical classiﬁcations. By automatically exploit-

ing these taxonomies with customized wrappers, it

is possible to build a relevant values set taking into

account the semantic grouping of services and goods

made in the original sources. In SEWASIE, the result

was the identiﬁcation of 135 relevant values. By ex-

ploiting again the original taxonomies, such relevant

values were then classiﬁed in a three levels hierarchy.

Figure 5 shows parts of the hierarchy: the ﬁrst level

divides the categories in two parts: the mechanical

sector and the plastic and rubber sector; then, each

sector is specialized in speciﬁc topics. For example,

the “mould

making” relevant value is contained in

the class “plastic

and rubber processes”

that is part of the “plastic

and rubber” sec-

tor. Moreover, the “mould

making” relevant value

stands for 54 values in the local sources, for example

large size moulds, castings, mould manufacturing, ...

Figure 6 shows the dimension and the number of ob-

tained relevant values.

4.2 Querying the GVV by Means of

the Relevant Attribute Set

We suppose that an end user is searching for enter-

prises by using the MOMIS system applied to this

domain. His query may assume different forms:

• The user is looking for enterprises without specify-

ing their kinds of production. The result is a set of

almost 30,000 enterprises and the related produc-

tions (several enterprises belong to more than one

category).

• The user is looking for enterprises belonging to a

speciﬁc category (e.g.:mould). The corresponding

query is:

INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES

Figure 5: Part of the hierarchy of the relevant values for the

category name attribute.

Figure 6: Attributes distribution in the relevant values set

for the SEWASIE project.

select

from Company C, List_categories L,

Category Cat

where C.id = L.company_id

and L.category_id = Cat.id

and Cat.name like ’mould’

The result is a set of 89 companies.

• The user is interested to enterprises working

on a category mould manufacturing by

exploiting the relevant values set (e.g.: the

moulding relevant value). In this case he has

to simply indicate the selected relevant value and

the selection clause of the query automatically

changes including as predicate all the instances

belonging to the relevant value. The choice of the

moulding relevant attribute generates a selection

clause

”... Cat.name in (’Moulding’,

’Compression injection moulding’,

’Insert moulding’, ...)”

. The result of this

query is 943 companies if the relevant values set is

calculated by the algorithm described in section 3,

1459 companies if relevant values set is calculated

by means of the “semantic” version (i.e., by using

the mould making value).

5 CONCLUSIONS AND FUTURE

WORK

In this paper we proposed a method to identify the

relevant values of selected global attributes. These

values may be provided to the end user (classiﬁed in a

hierarchy when it possible) in order to ease him hav-

ing the knowledge about a Global Class and writing a

query. There are few critical issues to be pointed out:

• the method is based on a data analysis. Working

with instances is a limit of such technique: if data

change, relevant values and the consequent hierar-

chy has to be updated. In speciﬁc contexts (data-

bases not frequently updated, e.g. databases for e-

commerce storing the products catalog for a com-

pany), data are almost static and then the calculus

has to be only occasionally re-done. Nevertheless,

the MOMIS wrappers were modiﬁed in order to pe-

riodically check the sources for verifying the rele-

vant values consistency.

• the relevant values identiﬁcation is a critical aspect:

the integration engineer has to deﬁne a set of rele-

vant values covering all the possible kinds of val-

ues, with a limited number of different values to be

easily visualized and known by the end user. Other-

wise, the end user does not know the Global Class

contents and does not ﬁnd the required information

to write a query.

• the method allows a user to write speciﬁc queries

having an organized knowledge of the sources con-

tents.

The future work will be addressed on three direc-

tions:

1. to improve the relevant values selection by propos-

ing to the user a multi-strategic approach for ob-

taining the most suitable relevant values set: for

this goal, we think of it is possible to use industrial

standards for classiﬁcation of services and goods

(e.g. UNSPSC, ecl@ss, NAICS, ...) inside speciﬁc

domains both for creating the most suitable set of

relevant values and for creating an automatic hier-

archy of these attributes;

2. to calculate relevant values of multiple global at-

tributes;

3. to improve the Query Manager graphical interface

allowing users to query the GVV without writing

any query.

ACKNOWLEDGMENTS

This work started in the EU SEWASIE project

(IST-2001-34825). Now the research activity

WEBIST 2006 - INTERNET TECHNOLOGY

continues within the Italian MIUR PRIN WIS-

DOM project (2004-2006). Further information at

http://www.dbgroup.unimo.it/wisdom.

REFERENCES

Beneventano, D., Bergamaschi, S., Guerra, F., and Vincini,

M. (2003). Synthesizing an integrated ontology. IEEE

Internet Computing Magazine, pages 42–51.

Beneventano, D. and Lenzerini, M. (2005). Final release

of the system prototype for query management. Se-

wasie, Deliverable D.3.5, Final Version, available at

http://www.dbgroup.unimo.it/pubs.html.

Bergamaschi, S., Castano, S., Beneventano, D., and

Vincini, M. (2001). Semantic integration of hetero-

geneous information sources. Data & Knowledge En-

gineering, Special Issue on Intelligent Information In-

tegration, 36(1):215–249.

Broder, A. Z., Maarek, Y. S., Bharat, K., Dumais, S. T.,

Papa, S., Pedersen, J., and Raghavan, P. (2005). Cur-

rent trends in the integration of searching and brows-

ing. In WWW (Special interest tracks and posters),

page 793.

Buneman, P., Davidson, S., Fernandez, M., and Suciu, D.

(1997). Adding structure to unstructured data. In

Proc. of ICDT 1997, pages 336–350, Delphi, Greece.

Chaudhuri, S., Ramakrishnan, R., and Weikum, G. (2005).

Integrating db and ir technologies: What is the sound

of one hand clapping? In Proceedings of the Second

Biennial Conference on Innovative Data Systems Re-

search, Asilomar, CA, USA, pages 1–12.

Dong, X. and Halevy, A. Y. (2005). Malleable schemas: A

preliminary report. In Proceedings of he Eight Inter-

national Workshop on the Web & Databases (WebDB

2005), Baltimore, Maryland, USA, pages 139–144.

Galindo-Legaria, C. A. (1994). Outerjoins as disjunctions.

In Snodgrass, R. T. and Winslett, M., editors, SIG-

MOD Conference, pages 348–358. ACM Press.

Gibson, D., Kleinberg, J., and Raghavan, P. (2000). Cluster-

ing categorical data: an approach based on dynamical

systems. VLDB Journal, 8(3-4):222–236.

Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., and

Flesca, S. (2004). The lixto data extraction project -

back and forth between theory and practice. In Pro-

ceedings of the Twenty-third ACM SIGACT-SIGMOD-

SIGART Symposium on Principles of Database Sys-

tems, pages 1–12, Paris, France.

Halevy, A. (2003). Data integration: a status report. In Pro-

ceedings of the German Database Conference, BTW-

03, Leipzig.

Halevy, A. Y. (2004). Structures, semantics and statistics.

In Proceedings of the 30th International Conference

on VLDB, Toronto, Canada, pages 4–6.

N. Noy, M. Uschold, C. W. (2005). Representing classes

as property values on the semantic web. Seman-

tic Web Best Practices and Deployment Working

Group, part of the W3C Semantic Web Activity.

(http://www.w3.org/TR/swbp-classes-as-values).

Nestorov, S., Abiteboul, S., and Motwani, R. (1997). In-

ferring structure in semistructured data. SIGMOD

Record, 26(4):39–43.

INSTANCES NAVIGATION FOR QUERYING INTEGRATED DATA FROM WEB-SITES