represents the sum of F-measure values resulted for
all metrics used, and represents the F-measure
value calculated for metric ‘m’:
F
m
w
m
=
1
F
∑
* F
m
5) Finally, weights are presented to the user who can
accept or modify them. If the user does not run the
sampler, then he either will define them, or
averaging the metrics (equal weights for all) is the
only mechanism that SASMINT can use, although it
may not produce desirable results.
5 CONCLUSION
In this paper, we introduce a semi-automatic schema
matching and integration tool called SASMINT and
explain how it uses linguistic techniques to
automatically resolve syntactical and semantic
heterogeneities between database schemas. In order
to identify the syntactic and semantic similarity
between the elements’ names from two schemas,
unlike other schema matching efforts, the SASMINT
system utilizes a combination of different types of
string similarity and semantic similarity metrics
from NLP. The use of a weighted and recursive
weighted sum of these metrics are proposed, giving
more accurate results. Furthermore, the Sampler
component of SASMINT helps users to influence
the weights for applying these metrics. A number of
tests were carried out to measure the correctness of
the metrics and the results are provided in this paper.
Other tests are being planned to compare SASMINT
with other similar systems.
REFERENCES
ENBI (2005). European Network for Biodiversity
Information (IST 2001-00618).
http://www.enbi.info.
Camarinha-Matos, L. M. and H. Afsarmanesh (2005).
Collaborative networks: A new scientific discipline.
Journal of Intelligent Manufacturing 16(4-5): 439-
452.
Cleverdon, C. W. and E. M. Keen (1966). Factors
determining the performance of indexing systems, vol
2: Test results, Aslib Cranfield Research Project.
Cranfield Institute of Technology.
Do, H. H. and E. Rahm (2002). COMA - A System for
Flexible Combination of Schema Matching
Approaches. In 28th International Conference on Very
Large Databases (VLDB).
Doan, A., J. Madhavan, et al. (2002). Learning to Map
between Ontologies on the Semantic Web. In World-
Wide Web Conf. (WWW-2002).
Fellbaum, C. (1998). An Electronic Lexical Database.,
Cambridge: MIT press.
Jaccard, P. (1912). The distribution of flora in the alpine
zone. The New Phytologist 11(2): 37-50.
Jaro, M. A. (1995). Probabilistic linkage of large public
health. Statistics in Medicine: 14:491-498.
Lesk, M. (1986). Automatic sense disambiguation using
machine readable dictionaries: how to tell a pine code
from an ice cream cone. In 5th SIGDOC Conference.
Levenshtein, V. I. (1966). Binary codes capable of
correcting deletions, insertions, and reversals.
Cybernetics and Control Theory 10(8): 707-710.
Madhavan, J., P. A. Bernstein, et al. (2001). Generic
Schema Matching with Cupid. In 27th International
Conference on Very Large Databases (VLDB).
Melnik, S., H. Garcia-Molina, et al. (2002). Similarity
Flooding: A Versatile Graph Matching Algorithm and
its Application to Schema Matching. In 18th
International Conference on Data Engineering (ICDE).
Miller, R. J., L. M. Haas, et al. (2000). Schema Mapping
as Query Discovery. In 26th International Conference
on Very Large Databases (VLDB).
Mitra, P., G. Wiederhold, et al. (2001). A scalable
framework for the interoperation of information
sources. International Semantic Web Working
Symposium.
Monge, A. E. and C. Elkan (1996). The Field Matching
Problem: Algorithms and Applications. In 2nd
International Conference on Knowledge Discovery
and Data Mining.
Pedersen, T., S. Banerjee, et al. (2003). Maximizing
Semantic Relatedness to Perform Word Sense
Disambiguation. Supercomputing Institute, University
of Minnesota.
Rijsbergen, C. J. v. (1979). Information Retrieval,
Butterworths, London.
Salton, G. and C. S. Yang (1973). On the specification of
term values in automatic indexing. Journal of
Documentation(29): 351-372.
Unal, O. and H. Afsarmanesh (2006). Interoperability in
Collaborative Network of Biodiversity Organizations.
In Proc. of PRO-VE'06 - Virtual Enterprises and
Collaborative Networks, Accepted for Publication.
Wu, Z. and M. Palmer (1994). Verb Semantics and
Lexical Selection. 32nd Annual Meeting of the
Association for Computational Linguistics.
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
120