From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL

Matheus Vieira, Thiago de Oliveira, Leandro Cicco, Daniel de Oliveira, Marcos Bedo

2024

Abstract

Business intelligence processes running over Data Warehouses (BIDW) heavily rely on quality, structured data to support decision-making and prescriptive analytics. In this study, we discuss the coupling of provenance mechanisms into the BIDW Extract-Transform-Load (ETL) stage to provide lineage tracking and data auditing, which (i) enhances the debugging of data transformation and (ii) facilitates issuing data accountability reports and dashboards. These two features are particularly beneficial for BIDWs tailored to assist managers and counselors in Universities and other educational institutions, as systematic auditing processes and accountability delineation depend on data quality and tracking. To validate the usefulness of provenance in this domain, we introduce the ProvETL tool that extends a BIDW with provenance support, enabling the monitoring of user activities and data transformations, along with the compilation of an execution summary for each ETL task. Accordingly, ProvETL offers an additional BIDW analytical layer that allows visualizing data flows through provenance graphs. The exploration of such graphs provides details on data lineage and the execution of transformations, spanning from the insertion of input data into BIDW dimensional tables to the final BIDW fact tables. We showcased ProvETL capabilities in three real-world scenarios using a BIDW from our University: personnel admission, public information in paycheck reports, and staff dismissals. The results indicate that the solution has contributed to spotting poor-quality data in each evaluated scenario. ProvETL also promptly pinpointed the transformation summary, elapsed time, and the attending user for every data flow, keeping the provenance collection overhead within milliseconds.

Download


Paper Citation


in Harvard Style

Vieira M., de Oliveira T., Cicco L., de Oliveira D. and Bedo M. (2024). From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7, SciTePress, pages 313-320. DOI: 10.5220/0012634500003690


in Bibtex Style

@conference{iceis24,
author={Matheus Vieira and Thiago de Oliveira and Leandro Cicco and Daniel de Oliveira and Marcos Bedo},
title={From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={313-320},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012634500003690},
isbn={978-989-758-692-7},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL
SN - 978-989-758-692-7
AU - Vieira M.
AU - de Oliveira T.
AU - Cicco L.
AU - de Oliveira D.
AU - Bedo M.
PY - 2024
SP - 313
EP - 320
DO - 10.5220/0012634500003690
PB - SciTePress