Authors:
Óscar Oliveira
and
Bruno Oliveira
Affiliation:
CIICESI, School of Management and Technology, Porto Polytechnic, Rua do Curral, Felgueiras, Portugal
Keyword(s):
Data Quality, Data Reliability, Data Warehouse, Data Lake, Quality Indicator.
Abstract:
Data Warehouse (DW) and Data Lake (DL) systems are mature and widely used technologies to integrate data for supporting decision-making. They support organizations to explore their operational data that can be used to take competitive advantages. However, the amount of data generated by humans in the last 20 years increased exponentially. As a result, the traditional data quality problems that can compromise the use of analytical systems, assume a higher relevance due to the massive amounts and heterogeneous formats of the data. In this paper, an approach for dealing with data quality is described. Using a case study, quality metrics are identified to define a reliability indicator, allowing the identification of poor-quality records and their impact on the data used to support enterprise analytics.