Analysis of Data Quality Problem Taxonomies

Arturs Zogla, Inga Meirane, Edgars Salna

2015

Abstract

There are many reasons to maintain high quality data in databases and other structured data sources. High quality data ensures better discovery, automated data analysis, data mining, migration and re-use. However, due to human errors or faults in data systems themselves data can become corrupted. In this paper existing data quality problem taxonomies for structured textual data and several improvements are analysed. A new classification of data quality problems and a framework for detecting data errors both with and without data operator assistance is proposed.

References

  1. Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly Detection: A Survey. ACM Computing Surveys (CSUR), 41 (3). ACM New York, NY, USA. pp. 1-72.
  2. Li, L., Peng, T., Kennedy, J., 2011. A Rule Based Taxonomy of Dirty Data. GSTF International Journal on Computing, 1 (2). Singapore. pp. 140-148.
  3. Oliveira, P., Rodrigues, F., Henriques P., Galhardas H., 2005. A Taxonomy of Data Quality Problems. In 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal, June 14, 2005.
  4. Kim W., et. al., 2003. A Taxonomy of Dirty Data. Data Mining and Knowledge Discovery, 7. Kluwer Academic Publishers, 2003. Manufactured in The Netherlands. pp. 81-99.
  5. Rahm, E., Hai Do, H., 2000. Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, 23 (4).
  6. Hernandez, M. A., Stolfo, S. J., 1998. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Data Mining and Knowledge Discovery, 2. Kluwer Academic Publishers, 1998. pp. 9-37.
Download


Paper Citation


in Harvard Style

Zogla A., Meirane I. and Salna E. (2015). Analysis of Data Quality Problem Taxonomies . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-097-0, pages 445-450. DOI: 10.5220/0005462604450450


in Bibtex Style

@conference{iceis15,
author={Arturs Zogla and Inga Meirane and Edgars Salna},
title={Analysis of Data Quality Problem Taxonomies},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2015},
pages={445-450},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005462604450450},
isbn={978-989-758-097-0},
}


in EndNote Style

TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Analysis of Data Quality Problem Taxonomies
SN - 978-989-758-097-0
AU - Zogla A.
AU - Meirane I.
AU - Salna E.
PY - 2015
SP - 445
EP - 450
DO - 10.5220/0005462604450450