Authors:
Osama A. Mehdi
;
Hamidah Ibrahim
and
Lilly Suriani Affendey
Affiliation:
Faculty of Computer Science and Information Technology, Malaysia
Keyword(s):
Schema Matching, Instance based Schema Matching, Google Similarity, Regular Expression.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Business Analytics
;
Collaboration and e-Services
;
Data Analytics
;
Data Engineering
;
Data Integrity
;
Data Management and Quality
;
Data Management for Analytics
;
Databases and Data Security
;
e-Business
;
Enterprise Information Systems
;
Information and Systems Security
;
Information Integration
;
Integration/Interoperability
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Ontologies and the Semantic Web
;
Symbolic Systems
Abstract:
Schema matching is the task of identifying correspondences between schema attributes that exist in different schemas. A variety of approaches have been proposed to achieve the main goal of high-quality match results with respect to precision (P) and recall (R). However, these approaches are unable to achieve high quality match results, as most of these approaches treated the instances as string regardless the data types of the instances. As a consequence, this causes unidentified matches especially for attribute with numeric instances which further reduces the quality of match results. Therefore, effort still needs to be done to further improve the quality of the match results. In this paper, we propose a framework for addressing the problem of finding matches between schemas of semantically and syntactically related data. Since we only fully exploit the instances of the schemas for this task, we rely on strategies that combine the strength of Google as a web semantic and regular expr
ession as pattern recognition. To demonstrate the accuracy of our framework, we conducted an experimental evaluation using real world data sets. The results show that our framework is able to find 1-1 schema matches with high accuracy in the range of 93% - 99% in terms of precision (P), recall (R), and F-measure (F).
(More)