# ONE-TO-MANY DATA TRANSFORMATIONS - As Relational Operations

### Paulo Carreira

#### Abstract

Transforming data is a fundamental operation in data management activities like data integration, legacy data migration, data cleaning, and extract-transform-load processes for data warehousing. Since data often resides on relational databases, data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, due to the limited expressive power of Relational Algebra, several important classes of data transformations cannot be specified as SQL queries. In particular, SQL is unable to express data transformations that require the dynamic creation of several tuples for each tuple of the source relation. This paper proposes to address this class of data transformations, common in data management activities, by extending Relational Algebra with a new relational operator named data mapper. A starting contribution of this work consists of studying the formal aspects of the mapper operator focusing on its formal semantics and expressiveness. A further contribution consists of supporting a cost-based optimization of data transformations expressions combining mappers with standard relational operators. To that end, a set of algebraic rewriting rules and different physical execution algorithms are being developed.

#### References

- Amer-Yahia, S. and Cluet, S. (2004). A declarative approach to optimize bulk loading into databases. ACM Transactions of Database Systems, 29(2):233-281.
- Carreira, P. and Galhardas, H. (2004). Efficient development of data migration transformations. In ACM SIGMOD Int'l Conf. on the Managt. of Data.
- Carreira, P., Galhardas, H., Lopes, A., and Pereira, J. (2005a). Extending relational algebra to express oneCarreira, P., Galhardas, H., Pereira, J., and Lopes, A. (2005b). Data mapper: An operator for expressiong one-to-many data transformations. In 7th Int'l Conf. on Data Warehousing and Knowledge Discovery, DaWaK 7805, volume 3589 of LNCS. SpringerVerlag.
- Cluet, S., Delobel, C., SimÃ©on, J., and Smaga, K. (1998). Your mediators need data conversion! In ACM SIGMOD Int'l Conf. on the Managt. of Data.
- Cunningham, C., Graefe, G., and Galindo-Legaria, C. A. (2004). PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS. In Proceedings of the International Conference on Very Large Data Bases (VLDB'04), pages 998-1009. Morgan Kaufmann.
- Galhardas, H., Florescu, D., Shasha, D., and Simon, E. (2000). Ajax: An extensible data cleaning tool. ACM SIGMOD Int'l Conf. on Managt. of Data, 2(29).
- Galhardas, H., Florescu, D., Shasha, D., Simon, E., and Saita, C. A. (2001). Declarative data cleaning: Language, model, and algorithms. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01).
- Hellerstein, J. M. and Naughton, J. F. (1996). Query execution techniques for caching expensive methods. ACM SIGMOD Int'l Conf. on Managt. of Data.
- Lakshmanan, L. V. S., Sadri, F., and Subramanian, I. N. (1996). SchemaSQL - A Language for Querying and Restructuring Database Systems. In Proc. Int'l Conf. on Very Large Databases (VLDB'96), pages 239-250.
- Miller, R. J. (1998). Using schematically heterogeneous structures. Proc. of ACM SIGMOD Int'l Conf. on the Managt. of Data, 2(22):189-200.
- Milo, T. and Zhoar, S. (1998). Using schema matching to simplify heterogeneous data translation. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'98).
- Papakonstantinou, Y., Garcia-Molina, H., and Ullman, J. (1996). MedMaker: A Mediator System Based on Declarative Specifications. In Proc. Int'l. Conf. on Data Engineering.
- Paredaens, J. (1978). On the expressive power of the relational algebra. Information Processing Letters, 7(2):107-111.
- Rahm, E. and Do, H.-H. (2000). Data Cleaning: Problems and current approaches. IEEE Bulletin of the Technical Comittee on Data Engineering, 24(4).
- Raman, V. and Hellerstein, J. M. (2001). Potter's Wheel: An Interactive Data Cleaning System. In Proc. of the Int'l Conf. on Very Large Data Bases (VLDB'01).
- Shu, N. C., Housel, B. C., and Lum, V. Y. (1975). CONVERT: A High Level Translation Definition Language for Data Conversion. Communic. of the ACM, 18(10):557-567.
- Shu, N. C., Housel, B. C., Taylor, R. W., Ghosh, S. P., and Lum, V. Y. (1977). EXPRESS: A Data EXtraction, Processing and REStructuring System. ACM Transactions on Database Systems, 2(2):134-174.
- Simitsis, A., Vassiliadis, P., and Sellis, T. K. (2005). Optimizing etl processes in data warehouses. In Proc. of the 21st Int'l Conf. on Data Engineering, ICDE 2005.
- van den Bercken, J., Dittrich, J. P., and Seeger, B. (2000). XXL: A prototype for a library of query processing algorithms. In Proc. of the ACM SIGMOD Int'l Conf. on Managt. of Data. ACM Press.
- Zhou, G., Hull, R., and King, R. (1996). Generating Data Integration Mediators That Use Materialization. Journal of Intelligent Information Systems, 6(2/3):199- 221.

#### Paper Citation

#### in Harvard Style

Carreira P. (2007). **ONE-TO-MANY DATA TRANSFORMATIONS - As Relational Operations** . In *Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,* ISBN 978-972-8865-88-7, pages 503-507. DOI: 10.5220/0002405105030507

#### in Bibtex Style

@conference{iceis07,

author={Paulo Carreira},

title={ONE-TO-MANY DATA TRANSFORMATIONS - As Relational Operations},

booktitle={Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},

year={2007},

pages={503-507},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0002405105030507},

isbn={978-972-8865-88-7},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the Ninth International Conference on Enterprise Information Systems - Volume 1: ICEIS,

TI - ONE-TO-MANY DATA TRANSFORMATIONS - As Relational Operations

SN - 978-972-8865-88-7

AU - Carreira P.

PY - 2007

SP - 503

EP - 507

DO - 10.5220/0002405105030507