Improving Data Cleansing Accuracy - A Model-based Approach

Mario Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Improving Data Cleansing Accuracy - A Model-based Approach

Topics: Data Management for Analytics; Information Quality

In Proceedings of 3rd International Conference on Data Management Technologies and Applications DATA - Volume 1, 189-201, 2014 , Vienna, Austria

Authors: Mario Mezzanzanica ¹ ; Roberto Boselli ¹ ; Mirko Cesarini ¹ and Fabio Mercorio ²

Affiliations: ¹ University of Milan-Bicocca, Italy ; ² University of Milano-Bicocca, Italy

Keyword(s): Data and Information Quality, Data Cleansing, Data Accuracy, Weakly-structured Data.

Related Ontology Subjects/Areas/Topics: Data Engineering ; Data Management and Quality ; Data Management for Analytics ; Information Quality

Abstract: Research on data quality is growing in importance in both industrial and academic communities, as it aims at deriving knowledge (and then value) from data. Information Systems generate a lot of data useful for studying the dynamics of subjects’ behaviours or phenomena over time, making the quality of data a crucial aspect for guaranteeing the believability of the overall knowledge discovery process. In such a scenario, data cleansing techniques, i.e., automatic methods to cleanse a dirty dataset, are paramount. However, when multiple cleansing alternatives are available a policy is required for choosing between them. The policy design task still relies on the experience of domain-experts, and this makes the automatic identification of accurate policies a significant issue. This paper extends the Universal Cleaning Process enabling the automatic generation of an accurate cleansing policy derived from the dataset to be analysed. The proposed approach has been implemented and tested on an on-line benchmark dataset, a real-world instance of the Labour Market Domain. Our preliminary results show that our approach would represent a contribution towards the generation of data-driven policy, reducing significantly the domain-experts intervention for policy specification. Finally, the generated results have been made publicly available for downloading. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 18.218.181.138

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Mezzanzanica, M., Boselli, R., Cesarini, M. and Mercorio, F. (2014). Improving Data Cleansing Accuracy - A Model-based Approach. In Proceedings of 3rd International Conference on Data Management Technologies and Applications - DATA; ISBN 978-989-758-035-2; ISSN 2184-285X, SciTePress, pages 189-201. DOI: 10.5220/0005004901890201

@conference{data14,
author={Mario Mezzanzanica and Roberto Boselli and Mirko Cesarini and Fabio Mercorio},
title={Improving Data Cleansing Accuracy - A Model-based Approach},
booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - DATA},
year={2014},
pages={189-201},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005004901890201},
isbn={978-989-758-035-2},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - DATA
TI - Improving Data Cleansing Accuracy - A Model-based Approach
SN - 978-989-758-035-2
IS - 2184-285X
AU - Mezzanzanica, M.
AU - Boselli, R.
AU - Cesarini, M.
AU - Mercorio, F.
PY - 2014
SP - 189
EP - 201
DO - 10.5220/0005004901890201
PB - SciTePress