Authors:
Mario Mezzanzanica
;
Roberto Boselli
;
Mirko Cesarini
and
Fabio Mercorio
Affiliation:
University of Milan-Bicocca, Italy
Keyword(s):
Data Quality, Data Management, Cleansing Algorithms, Model-based Reasoning.
Related
Ontology
Subjects/Areas/Topics:
Data Engineering
;
Data Management and Quality
;
Data Management for Analytics
;
Data Structures and Data Management Algorithms
;
Information Quality
Abstract:
Data cleansing is growing in importance among both public and private organisations, mainly due to the relevant amount of data exploited for supporting decision making processes. This paper is aimed to show how model-based verification algorithms (namely, model checking) can contribute in addressing data cleansing issues, furthermore a new benchmark problem focusing on the labour market dynamic is introduced.
The consistent evolution of the data is checked using a model defined on the basis of domain knowledge.
Then, we formally introduce the concept of universal cleanser, i.e. an object which summarises the set of all cleansing actions for each feasible data inconsistency (according to a given consistency model), then providing an algorithm which synthesises it. The universal cleanser can be seen as a repository of corrective interventions useful to develop cleansing routines.
We applied our approach to a dataset derived from the Italian labour market data, making the whole dataset
and outcomes publicly available to the community, so that the results we present can be shared and compared with other techniques.
(More)