loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Matheus Vieira 1 ; Thiago de Oliveira 2 ; Leandro Cicco 2 ; Daniel de Oliveira 1 and Marcos Bedo 1

Affiliations: 1 Institute of Computing, Fluminense Federal University, Brazil ; 2 Information Technology Superintendence, Fluminense Federal University, Brazil

Keyword(s): Data Warehousing, ETL, Provenance, Data Quality, Business Intelligence.

Abstract: Business intelligence processes running over Data Warehouses (BIDW) heavily rely on quality, structured data to support decision-making and prescriptive analytics. In this study, we discuss the coupling of provenance mechanisms into the BIDW Extract-Transform-Load (ETL) stage to provide lineage tracking and data auditing, which (i) enhances the debugging of data transformation and (ii) facilitates issuing data accountability reports and dashboards. These two features are particularly beneficial for BIDWs tailored to assist managers and counselors in Universities and other educational institutions, as systematic auditing processes and accountability delineation depend on data quality and tracking. To validate the usefulness of provenance in this domain, we introduce the ProvETL tool that extends a BIDW with provenance support, enabling the monitoring of user activities and data transformations, along with the compilation of an execution summary for each ETL task. Accordingly, ProvETL offers an additional BIDW analytical layer that allows visualizing data flows through provenance graphs. The exploration of such graphs provides details on data lineage and the execution of transformations, spanning from the insertion of input data into BIDW dimensional tables to the final BIDW fact tables. We showcased ProvETL capabilities in three real-world scenarios using a BIDW from our University: personnel admission, public information in paycheck reports, and staff dismissals. The results indicate that the solution has contributed to spotting poor-quality data in each evaluated scenario. ProvETL also promptly pinpointed the transformation summary, elapsed time, and the attending user for every data flow, keeping the provenance collection overhead within milliseconds. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 3.149.235.171

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Vieira, M.; de Oliveira, T.; Cicco, L.; de Oliveira, D. and Bedo, M. (2024). From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL. In Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-692-7; ISSN 2184-4992, SciTePress, pages 313-320. DOI: 10.5220/0012634500003690

@conference{iceis24,
author={Matheus Vieira. and Thiago {de Oliveira}. and Leandro Cicco. and Daniel {de Oliveira}. and Marcos Bedo.},
title={From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL},
booktitle={Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2024},
pages={313-320},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012634500003690},
isbn={978-989-758-692-7},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 26th International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - From Tracking Lineage to Enhancing Data Quality and Auditing: Adding Provenance Support to Data Warehouses with ProvETL
SN - 978-989-758-692-7
IS - 2184-4992
AU - Vieira, M.
AU - de Oliveira, T.
AU - Cicco, L.
AU - de Oliveira, D.
AU - Bedo, M.
PY - 2024
SP - 313
EP - 320
DO - 10.5220/0012634500003690
PB - SciTePress