Analysis of Data Quality Problem Taxonomies
Arturs Zogla, Inga Meirane, Edgars Salna
2015
Abstract
There are many reasons to maintain high quality data in databases and other structured data sources. High quality data ensures better discovery, automated data analysis, data mining, migration and re-use. However, due to human errors or faults in data systems themselves data can become corrupted. In this paper existing data quality problem taxonomies for structured textual data and several improvements are analysed. A new classification of data quality problems and a framework for detecting data errors both with and without data operator assistance is proposed.
References
- Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly Detection: A Survey. ACM Computing Surveys (CSUR), 41 (3). ACM New York, NY, USA. pp. 1-72.
- Li, L., Peng, T., Kennedy, J., 2011. A Rule Based Taxonomy of Dirty Data. GSTF International Journal on Computing, 1 (2). Singapore. pp. 140-148.
- Oliveira, P., Rodrigues, F., Henriques P., Galhardas H., 2005. A Taxonomy of Data Quality Problems. In 2nd Int. Workshop on Data and Information Quality (in conjunction with CAiSE 2005), Porto, Portugal, June 14, 2005.
- Kim W., et. al., 2003. A Taxonomy of Dirty Data. Data Mining and Knowledge Discovery, 7. Kluwer Academic Publishers, 2003. Manufactured in The Netherlands. pp. 81-99.
- Rahm, E., Hai Do, H., 2000. Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, 23 (4).
- Hernandez, M. A., Stolfo, S. J., 1998. Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem. Data Mining and Knowledge Discovery, 2. Kluwer Academic Publishers, 1998. pp. 9-37.
Paper Citation
in Harvard Style
Zogla A., Meirane I. and Salna E. (2015). Analysis of Data Quality Problem Taxonomies . In Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS, ISBN 978-989-758-097-0, pages 445-450. DOI: 10.5220/0005462604450450
in Bibtex Style
@conference{iceis15,
author={Arturs Zogla and Inga Meirane and Edgars Salna},
title={Analysis of Data Quality Problem Taxonomies},
booktitle={Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,},
year={2015},
pages={445-450},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0005462604450450},
isbn={978-989-758-097-0},
}
in EndNote Style
TY - CONF
JO - Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS,
TI - Analysis of Data Quality Problem Taxonomies
SN - 978-989-758-097-0
AU - Zogla A.
AU - Meirane I.
AU - Salna E.
PY - 2015
SP - 445
EP - 450
DO - 10.5220/0005462604450450