
6 RELATED WORK
Several approaches concerned with automatic schema
matching exist in the literature. Most of the ap-
proaches are focused in discovering equivalence re-
lationships (A. Doan and Halevy, 2002; Madhavan
et al., 2001), some of them also identify subsumption
relationships (Bergamaschi et al., 1998) and some in-
tersection (Hakimpour and Geppert, 2002). How-
ever, subsumption and intersection are discovered us-
ing external knowledge, like ontologies and thesauri,
or user-knowledge. Our approach identifies equiva-
lence, subsumption, intersection and disjointness re-
lationships by only examining element metadata and
data instances, without any user-intervention.
The work most related to ours is the one presented
in (Xu and Embley, 2003), where direct and indi-
rect matches between elements are discovered. Di-
rect matches are identified between equivalent ele-
ments and indirect matches are identified between
(a) subsuming elements, (b) boolean elements and
elements whose instances contain the boolean ele-
ments’ names, and (c) elements whose instances can
be merged or splitted. These relationships are dis-
covered based on schema information, ontologies and
regular expressions defined to match the instances of
elements.
Our framework covers all the relationships of (Xu
and Embley, 2003), except from the last one (c) which
in some cases is similar to our disjointness relation-
ship. In the case of boolean elements, our methodol-
ogy replaces their true and false instances with the
elements’ names and the concatenation of not and
their names, respectively, since the actual instances
do not provide much information. Therefore, if one
element contains the name of a boolean element in
its instances, this relationship will be identified. In
our framework, we also identify intersecting elements
that are not considered in (Xu and Embley, 2003).
GLUE (A. Doan and Halevy, 2002) is also similar
to our work. It proposes a bidirectional comparison
of schema elements, but it produces a single similar-
ity degree which takes the lowest value when the el-
ements do not have any common instances and the
highest when the elements are equivalent. Therefore,
the semantic relationships described in this paper can-
not be discovered by this approach.
7 CONCLUSIONS
In this paper, we have presented our approach to au-
tomatically discover semantic relationships between
schema elements. Based on a bidirectional compari-
son of the elements metadata and instances and with-
out any user or external knowledge, we are able to
discover equivalence, subsumption, intersection, dis-
jointness and incompatibility relationships. We have
shown our framework’s architecture and described the
components that we have implemented in the pro-
totype tool. Our experimental results are promising
with a 66% average precision and 75% average recall.
In the future, we are going to focus in the filtering
process, since low precision has been mainly caused
by incompatible pairs of elements that have not been
discarded. We can consider assinging weights to
modules based on their importance and reliability.
Precision can also be improved by detecting automat-
ically incremented elements and elements with small
domains. A brute-force module can assist in this pro-
cess and it would only impose a small overhead to
exhaustively compare a small number of instances.
Additionally, in the future we are going to extend
our prototype tool with a graphical user interface,
which will permit the user to validate or reject the se-
mantic relationships identified by our methodology,
and a component which will integrate the input data
sources based on the validated relationships.
REFERENCES
A. Doan, J. Madhavan, P. D. and Halevy, A. (2002). Learn-
ing to map ontologies on the Semantic Web. In Pro-
ceedings of the World-Wide Web Conference (WWW-
02), pages 662–673.
Bergamaschi, S., Castano, S., di Vimercati, S., Montanari,
S., and Vincini, M. (1998). An intelligent approach
to information integration. In In International Con-
ference on Formal Ontology in Information Systems
(FOIS’98), Italy, 1998, pages 253–267.
Hakimpour, F. and Geppert, A. (2002). Global schema
generation using formal ontologies. In Proceedings
of ER02, volume 2503 of LNCS, pages 307–321.
Springer-Verlag.
Kashyap, V. and Sheth, A. (1996). Semantic and schematic
similarities between database objects: a context-
based approach. VLDB Journal, 5(4):276–304.
Larson, J., Navathe, S., and Elmasri, R. (1989). A theory of
attribute equivalence in databases with application to
schema integration. IEEE Transactions on Software
Engineering, 15(4):449–463.
Madhavan, J., Bernstein, P. A., and Rahm, E. (2001).
Generic schema matching with Cupid. In Proc. 27th
VLDB Conference, pages 49–58.
Rizopoulos, N. (2003). Discovery of semantic relationships
between schema elements. Technical report, AutoMed
Project.
Xu, L. and Embley, D. W. (2003). Discovering direct and
indirect matches for schema elements. In 8th Interna-
tional Conference on Database Systems for Advanced
Applications (DASFAA ’03), Kyoto, Japan, March 26–
28, 2003, pages 39–46.
ICEIS 2004 - DATABASES AND INFORMATION SYSTEMS INTEGRATION
8