The Figure 8 query execution times for the parti-
tioned databases executed faster than the same query
executed on a single source TPC-H database. The dif-
ference is due to the fact that the embedded database
in Unity executes one of the joins on the client ma-
chine, as opposed to the server performing all three
joins. The query executing all three joins on the server
involves the join of the large Lineitem relation requir-
ing full use of the CPU on the server. For the parti-
tioned databases Unity is able to start the global join
once the smaller ResultsSet from the Part relation is
received. This allows for the embedded database in
Unity to work in parallel with the server yielding an
improved execution time for this particular query.
The results in Figure 9 show a longer execution
time for integration versus full execution on the single
source TPC-H database. The increased time is due to
the time to transport to the client the entire Part rela-
tion (1.3 seconds) from the Part database in order to
complete the global join. The single source query im-
ports only the 76 tuples in the final result. Also, the re-
sults for the integration of the partitioned data located
on one computer takes slightly longer than data parti-
tioned on different computers, as the same resources
are shared on a single computer.
5 CONCLUSION
The Unity JDBC driver provides an automatic and
scalable approach to integrate and then query multiple
data sources. The JDBC interface provides standard
methods to access the data. The ability to quickly re-
compute the global view allows for dynamic integra-
tion of a large number of databases. The lightweight
database engine embedded in the driver integrates the
data from multiple databases transparently. The addi-
tion of the conceptual query language allows queries
to be specified on the global view without the require-
ment of understanding the structure of each under-
lying schema. Experimental results show that this
approach causes minimal overhead in the querying
process. In addition, the driver efficiently executes
the queries by identifying the subqueries to execute
on the servers and using the client to complete the
integration. This unique approach allows integration
to be more automated, scaleable, and rapidly deploy-
able.
Future work will investigate a more powerful
global query optimizer. By obtaining information
about the data sources including selectivity and re-
lation size, the global join strategy could be opti-
mized. In addition, strategies to effectively implement
GROUP BY queries will be examined.
REFERENCES
Collet, C., Huhns, M., and Shen, W.-M. (1991). Resource
Integration Using a Large Knowledge Base in Carnot.
IEEE Computer, 24(12):55–62.
Covitz, P., Hartel, F., Schaefer, C., Coronado, S., Fragoso,
G., Sahni, H., Gustafson, S., and Buetow, K. (2003).
caCORE: A common infrastructure for cancer infor-
matics. Bioinformatics, 19(18):2404–2412.
Decker, S., Erdmann, M., and Studer, R. (1998). ONTO-
BROKER: Ontology based access to distributed and
semi-structured information. In Database Semantics
- Semantic Issues in Multimedia Systems, volume 138
of IFIP Conference Proceedings. Kluwer.
Dragut, E. and Lawrence, R. (2004). Composing map-
pings between schemas using a reference ontology. In
ODBASE.
Goh, C., Bresson, S., Madnich, S., and Siegel, M. (1999).
Context Interchange: New Features and Formalisms
for the Intelligent Integration of Information. ACM
Transactions on Information Systems, 17(3):270–293.
Goldman, R., Shivakumar, N., Venkatasubramanian, S., and
Garcia-Molina, H. (1998). Proximity Search in Data-
bases. In VLDB, pages 26–37.
Haas, L., Lin, E., and Roth, M. (2002). Database Integration
through Database Federation. IBM Systems Journal,
41(4):578–596.
Halevy, A. (2001). Answering queries using views: A sur-
vey. VLDB Journal, 10(4):270–294.
Kirk, T., Levy, A., Sagiv, Y., and Srivastava, D. (1995). The
Information Manifold. In AAAI Spring Symposium on
Information Gathering.
Lenzerini, M. (2002). Data Integration: A Theoretical Per-
spective. In PODS, pages 233–246.
Li, C., Yerneni, R., Vassalos, V., Garcia-Molina, H.,
Papakonstantinou, Y., Ullman, J., and Valiveti, M.
(1998). Capability Based Mediation in TSIMMIS. In
ACM SIGMOD, pages 564–566.
Rahm, E. and Bernstein, P. (2001). A survey of ap-
proaches to automatic schema matching. VLDB Jour-
nal, 10(4):334–350.
Sheth, A. and Larson, J. (1990). Federated Database Sys-
tems for Managing Distributed, Heterogenous and
Autonomous Databases. ACM Computing Surveys,
22(3):183–236.
Ullman, J. (1997). Information Integration Using Logical
Views. In ICDT’97, volume 1186 of LNCS, pages 19–
40.
Wald, J. and Sorenson, P. (1984). Resolving the Query
Inference Problem Using Steiner Trees. TODS,
9(3):348–368.
Zeller, H. and Gray, J. (1990). An adaptive hash join al-
gorithm for multiuser environments. In VLDB 1990,
pages 186–197.
DYNAMIC DATABASE INTEGRATION IN A JDBC DRIVER
333