Authors:
Qizhen Huang
;
Chaoliang Zhong
and
Jun Zhang
Affiliation:
Fujitsu R&D Center Co. and LTD., China
Keyword(s):
Data Integration, Hierarchical Mediated Schema, Schema Clustering.
Related
Ontology
Subjects/Areas/Topics:
Coupling and Integrating Heterogeneous Data Sources
;
Databases and Information Systems Integration
;
Enterprise Application Integration
;
Enterprise Information Systems
Abstract:
In data integration, users can access multiple data sources through a uniform interface. Yet it is still not easy to query from data sources where many domains coexist even if the data sources are clustered into several domains since users have to write different query clauses for each different domain. Previous researches have presented various data integration techniques, but nearly all of them require the schemas of data sources to be integrated belong to the same domain, or failed to address that some different domains may be the sub-domains of a high level domain in which case a more abstract query clause for upper domain can substitute several less abstract query clauses for lower domains. In this paper, we propose a graph-based approach for clustering schemas which would finally expose to users a hierarchical mediated schema forest, and a query forwarding mechanism to transform queries down along the schema forest. A set of experimental results demonstrate that our schema clus
tering algorithm is effective in clustering the data sources into hierarchical schemas, queries on the mediated schemas could achieve answers with good accuracy, and the cost of writing query clauses for users is reduced without losing query accuracy.
(More)