ONE-TO-MANY DATA TRANSFORMATION OPERATIONS - Optimization and Execution on an RDBMS

Paulo Carreira, Helena Galhardas, João Pereira, Andrzej Wichert

2007

Abstract

The optimization capabilities of RDBMSs make them attractive for executing data transformations that support ETL, data cleaning and integration activities. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produces several output tuples for a single input tuple cannot be expressed in that way. To address this limitation a new operator, named data mapper, has been proposed as an extension of Relational Algebra for expressing one-to-many data transformations. In this paper we study the feasibility of implementing the mapper operator as a primitive operator on an RDBMS. Data transformations expressed as combinations of standard relational operators and mappers can be optimized resulting in interesting performance gains.

References

  1. Aho, A. V. and Ullman, J. D. (1979). Universality of data retrieval languages. In Proc. of the 6th ACM SIGACTSIGPLAN Symposium on Principles of Programming Lang., pages 110-119. ACM Press.
  2. Amer-Yahia, S. and Cluet, S. (2004). A declarative approach to optimize bulk loading into databases. ACM Transactions of Database Systems, 29(2):233-281.
  3. Apache (2005). http://db.apache.org/derby.
  4. Carreira, P., Galhardas, H., Lopes, A., and Pereira, J. (2006). One-to-many transformation through data mappers. Data and Knowledge Engineering Journal (DKE), Elsevier Science.
  5. Carreira, P., Galhardas, H., Pereira, J., and Lopes, A. (2005b). Data mapper: An operator for expressiong one-to-many data transformations. In 7th Int'l Conf. on Data Warehousing and Knowledge Discovery, DaWaK 7805, volume 3589 of LNCS. SpringerVerlag.
  6. Chaudhuri, S. (1998). An overview of query optimization in relational systems. In PODS 7898: Proc. of the ACM Symp. on Principles of Database Systems, pages 34- 43. ACM Press.
  7. Chaudhuri, S. and Shim, K. (1993). Query optimization in the presence of foreign functions. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'93), pages 529-542.
  8. Cui, Y. and Widom, J. (2001). Lineage tracing for general data warehouse transformations. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01).
  9. Eisenberg, A., Melton, J., Michels, K. K. J.-E., and Zemke, F. (2004). SQL:2003 has been published. ACM SIGMOD Record, 33(1):119-126.
  10. Galhardas, H., Florescu, D., Shasha, D., and Simon, E. (2000). Ajax: An extensible data cleaning tool. ACM SIGMOD Int'l Conf. on Management of Data, 2(29).
  11. Garcia-Molina, H., Ullman, J. D., and Widom, J. (2002). Database Systems - The Complete Book. PrenticeHall.
  12. Haas, L. M., Miller, R. J., Niswonger, B., Roth, M. T., Schwarz, P. M., and Wimmers, E. L. (1999). Transforming heterogeneous data with database middleware: Beyond integration. IEEE Data Engineering Bulletin, 22(1):31-36.
  13. Hellerstein, J. M. (1998). Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems, 22(2):113-157.
  14. Lomet, D. and Rundensteiner, E. A., editors (1999). Special Issue on Data Transformations, volume 22. IEEE Data Engineering Bulletin.
  15. Melton, J. and Simon, A. R. (2002). SQL:1999 Understanding Relational Language Components. Morgan Kaufmann Publishers, Inc.
  16. Rahm, E. and Do, H.-H. (2000). Data Cleaning: Problems and current approaches. IEEE Bulletin of the Technical Comittee on Data Engineering, 24(4).
  17. Raman, V. and Hellerstein, J. M. (2001). Potter's Wheel: An Interactive Data Cleaning System. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01).
  18. TPC (1999). Benchmark H standard specification. http://www.tpc.org.
  19. van den Bercken, J., Dittrich, J. P., Kräamer, J., Schäafer, T., Schneider, M., and Seeger, B. (2001). XXL A library approach to supporting efficient implementations of advanced database queries. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01).
Download


Paper Citation


in Harvard Style

Carreira P., Galhardas H., Pereira J. and Wichert A. (2007). ONE-TO-MANY DATA TRANSFORMATION OPERATIONS - Optimization and Execution on an RDBMS . In Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-972-8865-88-7, pages 21-27. DOI: 10.5220/0002371900210027


in Bibtex Style

@conference{iceis07,
author={Paulo Carreira and Helena Galhardas and João Pereira and Andrzej Wichert},
title={ONE-TO-MANY DATA TRANSFORMATION OPERATIONS - Optimization and Execution on an RDBMS},
booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2007},
pages={21-27},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002371900210027},
isbn={978-972-8865-88-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ONE-TO-MANY DATA TRANSFORMATION OPERATIONS - Optimization and Execution on an RDBMS
SN - 978-972-8865-88-7
AU - Carreira P.
AU - Galhardas H.
AU - Pereira J.
AU - Wichert A.
PY - 2007
SP - 21
EP - 27
DO - 10.5220/0002371900210027