STATISTICS API: DBMS-INDEPENDENT ACCESS AND MANAGEMENT OF DBMS STATISTICS IN HETEROGENEOUS ENVIRONMENTS

Tobias Kraft, Bernhard Mitschang

2007

Abstract

Many of today’s applications access not a single but a multitude of databases running on different DBMSs. Federation technology is being used to integrate these databases and to offer a single query-interface to the user where he can run queries accessing tables stored on different remote databases. So, the optimizer of the federated DBMS has to decide what portion of the query should be processed by the federated DBMS itself and what portion should be executed at the remote systems. Thereto, it has to retrieve cost estimates for query fragments from the remote databases. The response of these databases typically contains cost and cardinality estimates but no statistics about the data stored in these databases. However, statistics are optimization-critical information which is the crucial factor for any kind of decision making in the optimizer of the federated DBMS. When this information is not available optimization has to rely on imprecise heuristics mostly based on default selectivities. To fill this gap, we propose Statistics API, a JAVA interface that provides DBMS-independent access to statistics data stored in databases running on different DBMSs. Statistics API also defines data structures used for the statistics data returned by or passed to the interface. We have implemented this interface for the three prevailing commercial DBMSs IBM DB2, Oracle and Microsoft SQL Server. These implementations are available under the terms of the GNU Lesser General Public License (LGPL). This paper introduces the interface, i.e. the methods and data structures of the Statistics API, and discusses some details of the three interface implementations.

References

  1. Ellis, J., Ho, L., and Fisher, M. (2001). JDBC(TM) 3.0 Specification, Final Release. Sun Microsystems, Inc.
  2. Ewen, S., Ortega-Binderberger, M., and Markl, V. (2005). A learning optimizer for a federated database management system. In Proc. BTW, Karlsruhe, Germany.
  3. IBM (2004a). IBM DB2 Information Integrator, Wrapper Developer's Guide, Version 8.2. IBM Corp.
  4. IBM (2004b). IBM DB2 Universal Database, Administration Guide: Performance, Version 8.2. IBM Corp.
  5. IBM (2004c). IBM DB2 Universal Database, SQL Reference Volume 1, Version 8.2. IBM Corp.
  6. Ioannidis, Y. (2003). The History of Histograms (abridged). In Proc. VLDB, Berlin, Germany.
  7. Kraft, T., Schwarz, H., Rantzau, R., and Mitschang, B. (2003). Coarse-Grained Optimization: Techniques for Rewriting SQL Statement Sequences. In Proc. VLDB, Berlin, Germany.
  8. Lu, H., Ooi, B. C., and Goh, C. H. (1993). Multidatabase Query Optimization: Issues and Solutions. In Proc. RIDE-IMS, Vienna, Austria.
  9. Melton, J., Michels, J.-E., Josifovski, V., Kulkarni, K. G., and Schwarz, P. M. (2002). SQL/MED - A Status Report. SIGMOD Record, 31(3):81-89.
  10. Melton, J., Michels, J.-E., Josifovski, V., Kulkarni, K. G., Schwarz, P. M., and Zeidenstein, K. (2001). SQL and Management of External Data. SIGMOD Record, 30(1):70-77.
  11. Microsoft (2006). SQL Server 2005 Books Online - Transact-SQL Reference. http://msdn2.microsoft.com/ en-us/library/ms189826.aspx. Microsoft Corp.
  12. Oracle (2003a). Oracle Database Performance Tuning Guide, 10g Release 1 (10.1). Oracle Corp.
  13. Oracle (2003b). Oracle Database Reference, 10g Release 1 (10.1). Oracle Corp.
  14. Oracle (2003c). PL/SQL Packages and Types Reference, 10g Release 1 (10.1). Oracle Corp.
  15. Oracle (2006). Ask Tom. http://asktom.oracle.com/. Oracle Corp.
  16. Roth, M. T., Arya, M., Haas, L. M., Carey, M. J., Cody, W. F., Fagin, R., Schwarz, P. M., Thomas II, J., and Wimmers, E. L. (1996). The Garlic Project. In Proc. SIGMOD, Montreal, Quebec, Canada.
  17. Roth, M. T., Ozcan, F., and Haas, L. M. (1999). Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System. In Proc. VLDB, Edinburgh, Scotland, UK.
  18. Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., and Price, T. (1979). Access Path Selection in a Relational Database Management System. In Proc. SIGMOD, Boston, Massachusetts, USA.
Download


Paper Citation


in Harvard Style

Kraft T. and Mitschang B. (2007). STATISTICS API: DBMS-INDEPENDENT ACCESS AND MANAGEMENT OF DBMS STATISTICS IN HETEROGENEOUS ENVIRONMENTS . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 5-12. DOI: 10.5220/0002365200050012


in Bibtex Style

@conference{iceis07,
author={Tobias Kraft and Bernhard Mitschang},
title={STATISTICS API: DBMS-INDEPENDENT ACCESS AND MANAGEMENT OF DBMS STATISTICS IN HETEROGENEOUS ENVIRONMENTS},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={5-12},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002365200050012},
isbn={978-972-8865-88-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - STATISTICS API: DBMS-INDEPENDENT ACCESS AND MANAGEMENT OF DBMS STATISTICS IN HETEROGENEOUS ENVIRONMENTS
SN - 978-972-8865-88-7
AU - Kraft T.
AU - Mitschang B.
PY - 2007
SP - 5
EP - 12
DO - 10.5220/0002365200050012