spondence discoveries lead to true knowledge about
the relationships between the actual schemas from
both databases. This relationship knowledge is ex-
tremely reliable because it has been extracted from
the database themselves, through the instance data.
As our knowledge of these relationships increases,
we can specify more exacting and detailed ICAs, re-
sulting in even more knowledge. As the ICAs be-
come more complex, and begin to rely increasingly on
schema-level information, the match functions must
grow in complexity accordingly. There are several
good match algorithms (Aslan and Mcleod, 1999;
Chua et al., 2003; Castano et al., 2001; Gal et al.,
2003; Lawrence and Barker, 2001; Li and Clifton,
2000; Schmitt and Trker, 1998) that could be used
at this stage in the process.
2.4 The Result
The correspondence investigation method proposed
provides an initial discovery of inter-database rela-
tionships. During the implementation of the method,
both information about the individual schemas of the
component databases as well as initial inter-schema
relationship are discovered. Accepted ICAs provide
direct inter-schema relationship information. With
only a portion of the component databases schemas
needing transformation, the information in the ac-
cepted ICAs can be used in building this partial inte-
grated schema. The existence of a partially integrated
schema aids the integration specialist in detecting new
assertions (pICAs) as well as errors with the existing
ICAs. The result is enough information to make the
approaches based on schema integration more viable
and efficient.
3 CONCLUSIONS
This paper has proposed a method for correspondence
investigation that does not suffer if there is a lack of
expert knowledge of the schemas involved nor does it
assume that those schemas are well designed. Imple-
mentation of this method will greatly reduce the man-
ual effort involved in the integration process, which
currently heavily dependent upon such efforts.
In future, we plan to integrate existing match func-
tions developed in the artificial intelligence field with
our work. Sophisticated and efficient match functions
used in comparing instance data may give us more
correspondence than would be otherwise possible. In
future we would also like to develop a language to
formally specify the inter-correspondence assertions.
ACKNOWLEDGEMENTS
The work was supported in part by a grant from
AFOSR under Award No. FA9550-04-1-0102.
REFERENCES
Aslan, G. and Mcleod, D. (1999). Semantic heterogene-
ity resolution in federated databases by metadata im-
plantation and stepwise evolution. The VLDB Journal,
8:120–132.
Castano, S., Antonellis, V. D., and di Vemercati, S. D. C.
(2001). Global Viewing of Heterogeneous Data
Sources. IEEE Transactions on Data Knowledge and
Engineering, 13(2):277–297.
Chua, C. E. H., Chiang, R. H. L., and Lim, E. (2003).
Instance-based attribute identification in database in-
tegration. The VLDB Journal, 12:228–243.
Gal, A., Trombetta, A., Anaby-Tavor, A., and Montesi, D.
(2003). A model for schema integration in heteroge-
neous databases. In Proceedings of the 7th Interna-
tional Database Engineering and Applications Sym-
posium IDEAS ’03), pages 2–11.
Lawrence, R. and Barker, K. (2001). Integrating relational
database schemas using a standardized dictionary. In
Proceedings of the 2001 ACM Symposium of Applied
Computing, pages 225–230.
Lenzerini, M. (2002). Data integration: A theoreti-
cal perspective. In Proceedings of the 21st ACM
SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems, pages 233–246.
Li, W. and Clifton, C. (2000). SemInt: A Tool for Iden-
tifying Attribute Correspondences in Heterogeneous
Databases Using Neural Network. IEEE Transactions
on Data Knowledge and Engineering, 33(1):49–84.
Parent, C. and Spaccapietra, S. (1998). Issues and Ap-
proaches of Database Integration. CACM, 41(5):166–
178.
Rahm, E. and Bernstein, P. A. (2001). A survey of ap-
proaches to automatic schema matching. The VLDB
Journal, 10:334–350.
Schmitt, I. and Trker, C. (1998). An incremental approach
to schema integration by refining extensional relation-
ships. In Proceedings of the 7th International Con-
ference on Information and Knowledge Management,
pages 322–330.
Yan, G., Ng, W. K., and Lim, E. (2002). Product Schema In-
tegration for Electronic Commerce A Synonym Ap-
proach. IEEE Transactions of Knowledge and Data
Engineering, 14(3):583–598.
Zhang, J. (1994). A formal specification model and its ap-
plication on multidatabase systems. In Proceedings of
the 1994 Conference of the Centre for Advanced Stud-
ies in Collaborative Research, pages 76–89.
A METHOD FOR EARLY CORRESPONDENCE DISCOVERY USING INSTANCE DATA
263