Authors:
Paulo Carreira
1
;
Helena Galhardas
2
;
João Pereira
2
and
Andrzej Wichert
2
Affiliations:
1
Faculty of Sciences, University of Lisbon, Portugal
;
2
INESC-ID, Portugal
Keyword(s):
Data Warehousing, Data Cleaning, Data Integration, ETL, Query optimization.
Related
Ontology
Subjects/Areas/Topics:
Coupling and Integrating Heterogeneous Data Sources
;
Data Warehouses and OLAP
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Legacy Systems
Abstract:
The optimization capabilities of RDBMSs make them attractive for executing data transformations that support ETL, data cleaning and integration activities. However, despite the fact that many useful data transformations can be expressed as relational queries, an important class of data transformations that produces several output tuples for a single input tuple cannot be expressed in that way.
To address this limitation a new operator, named data mapper, has been proposed as an extension of Relational Algebra for expressing one-to-many data transformations. In this paper we study the feasibility of implementing the mapper operator as a primitive operator on an RDBMS. Data transformations expressed as combinations of standard relational operators and mappers can be optimized resulting in interesting performance gains.