Authors:
Zineb El Akkaoui
1
;
Alejandro Vaisman
2
and
Esteban Zimányi
2
Affiliations:
1
SEEDS Team, INPT Lab, Rabat and Morocco
;
2
CoDE Lab, Université Libre de Bruxelles, Brussels and Belgium
Keyword(s):
ETL processes, Data Integration Performance, Design Quality, Theoretical Validation, Empirical Validation.
Related
Ontology
Subjects/Areas/Topics:
Data Warehouses and OLAP
;
Databases and Information Systems Integration
;
Enterprise Information Systems
;
Performance Evaluation and Benchmarking
Abstract:
The Extraction, Transformation and Loading (ETL) process is a crucial component of a data warehousing architecture. ETL processes are usually complex and time-consuming. Particularly important (although overlooked) in ETL development is the design phase, since it impacts on the subsequent ones, i.e., implementation and execution. Addressing ETL quality at the design phase allows taking actions that can have a positive and low-cost impact on process efficiency. Using the well-known Briand et al. framework (a theoretical validation framework for system artifacts), we formally specify a set of internal metrics that we conjecture to be correlated with process efficiency. We also provide empirical validation of this correlation, as well as an analysis of the metrics that have stronger impact on efficiency. Although there exist proposals in the literature addressing design quality in ETL, as far as we are aware of, this is the first proposal aimed at using metrics over ETL models to predic
t the performance associated to these models.
(More)