A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES
Taoxin Peng
2008
Abstract
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.
References
- Atre, S., 1998. Rules for data cleansing. Computerworld.
- Galhardas, H., Florescu, D., Shasha, D., 2001. Declaratively Data Cleaning: Language, Model, and Algorithms. In Proceedings of the 27th International Conference on Very Large Databases (VLDB), Roma, Italy.
- Halevy, A., Rajaraman, A., Ordille, J., 2006. Data Integration: The Teenage Years. In the 32nd International Conference on Very Large Databases. Seoul, Korea.
- Hipp, J., Guntzer, U., Grimmer, U., 2001. Data Quality Mining. In the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
- 28-09-2003 10-09-2005 Jarke, M. M., Jeusfeld, A., Quix, C., Vassiladis, P., 1999. Architecture and Quality in Data Warehouses: An Extended Repository Approach. Information Systems, 24(3).
- Luebbers, D., Grimmer, U., Jarke, M., 2003. Systematic Development of Data Mining-Based Data Quality Tools. In the 29th International Conference on Very Large Databases, Berlin, Germany.
- Liu, H., Shah, S., Jiang, W., 2004. On-line Outlier Detection and Data Cleaning. Computers and Chemical Engineering 28.
- Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R., Catarci, T., Batini, C., 2003. The DAQUINCIS Broker: Querying Data and Their Quality in Cooperative Information Systems. Journal of Data Semantics, Vol. 1, LNCS 2800.
- Muller, H., Freytag, J. C., 2003. Problems, Methods, and Challenges in Comprehensive Data Cleansing. Technical. Report, HUB-1B-164.
- Rahm, E., Do, H., 2000. Data Cleaning: Problems and Current Approaches. IEEE Bulletin of the Technical Committee on Data Engineering, Vol 23, No. 4.
- Raman, V., Hellerstein, J., 2001. Potter's Wheel: An Interactive Data Cleaning System. In the 27th International Conference on Very Large Databases. Roma, Italy.
- Sung, S., Li, Z., Sun, P., 2002. A fast Filtering Scheme for Large Database Cleaning. In the 11th International Conference on Information and Knowledge Management, Virginia, USA.
- Winkler, W., 2003. Data Cleaning Methods, In the Conference SIGKDD, Washington DC, USA.
- Wang, Y., Storey, V., Firth, C., 1995. A Framework for Analysis of Data Quality Research, IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 4.
Paper Citation
in Harvard Style
Peng T. (2008). A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES . In Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 978-989-8111-36-4, pages 473-478. DOI: 10.5220/0001706004730478
in Bibtex Style
@conference{iceis08,
author={Taoxin Peng},
title={A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES},
booktitle={Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2008},
pages={473-478},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0001706004730478},
isbn={978-989-8111-36-4},
}
in EndNote Style
TY - CONF
JO - Proceedings of the Tenth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - A FRAMEWORK FOR DATA CLEANING IN DATA WAREHOUSES
SN - 978-989-8111-36-4
AU - Peng T.
PY - 2008
SP - 473
EP - 478
DO - 10.5220/0001706004730478