Table 1: CityA_Patient.
No. L. Name F. Name Age Address Post VST VET
1 Cole Liam 23 City A Student 28-09-2003 Now
2 Gerrard John 25 City A Student 02-10-2001 Now
3 Small Helen 23 City A Student 26-09-2002 Now
4 Smith William 24 City A Student 01-10-2001 Now
5 Smith Mary 28 City A Student 12-10-2001 10-09-2005
Table 2: CityB_Patient.
No. L. Name F. Name Age Address Post VST VET
1 Cole Liam 23 City B Engineer 20-08-2007 Now
2 Gerrard John 25 City B Engineer 18-09-2005 Now
3 Small Helen 23 City B Student 28-09-2003 Now
4 Smith William 24 City B Student 08-10-2005 Now
5 Smith Kirsty 30 City B Engineer 10-10-2005 Now
data quality dimensions and decoupling the data
cleaning process into two sub-processes, base d on
different purposes. This framework retains the most
appealing characteristics of existing data cleaning
approaches, and enjoys being able to improve the
efficiency of data cleaning in data warehouse
applications.
The work introduces a number of further
investigations, including: a) to examine further
characteristics of data quality dimensions, in order to
develop a detailed guidance for determining the
choice of a particular strategy for data cleaning in
data warehouses; b) to develop a comprehensive
data cleaning tool for data warehouses based on the
framework proposed in this paper; and c) to test the
framework by applying it onto bigger multi data
sources. The successful outcome of such future work
would certainly enhance the performance of data
cleaning systems in data warehouses.
REFERENCES
Atre, S., 1998. Rules for data cleansing. Computerworld.
Galhardas, H., Florescu, D., Shasha, D., 2001.
Declaratively Data Cleaning: Language, Model, and
Algorithms. In Proceedings of the 27
th
International
Conference on Very Large Databases (VLDB), Roma,
Italy.
Halevy, A., Rajaraman, A., Ordille, J., 2006. Data
Integration: The Teenage Years. In the 32
nd
International Conference on Very Large Databases.
Seoul, Korea.
Hipp, J., Guntzer, U., Grimmer, U., 2001. Data Quality
Mining. In the 6
th
ACM SIGMOD Workshop on
Research Issues in Data Mining and Knowledge
Discovery.
Jarke, M. M., Jeusfeld, A., Quix, C., Vassiladis, P., 1999.
Architecture and Quality in Data Warehouses: An
Extended Repository Approach. Information Systems,
24(3).
Luebbers, D., Grimmer, U., Jarke, M., 2003. Systematic
Development of Data Mining-Based Data Quality
Tools. In the 29
th
International Conference on Very
Large Databases, Berlin, Germany.
Liu, H., Shah, S., Jiang, W., 2004. On-line Outlier
Detection and Data Cleaning. Computers and
Chemical Engineering 28.
Mecella, M., Scannapieco, M., Virgillito, A., Baldoni, R.,
Catarci, T., Batini, C., 2003. The DAQUINCIS
Broker: Querying Data and Their Quality in
Cooperative Information Systems. Journal of Data
Semantics, Vol. 1, LNCS 2800.
Muller, H., Freytag, J. C., 2003. Problems, Methods, and
Challenges in Comprehensive Data Cleansing.
Technical. Report, HUB-1B-164.
Rahm, E., Do, H., 2000. Data Cleaning: Problems and
Current Approaches. IEEE Bulletin of the Technical
Committee on Data Engineering, Vol 23, No. 4.
Raman, V., Hellerstein, J., 2001. Potter’s Wheel: An
Interactive Data Cleaning System. In the 27
th
International Conference on Very Large Databases.
Roma, Italy.
Sung, S., Li, Z., Sun, P., 2002. A fast Filtering Scheme for
Large Database Cleaning. In the 11
th
International
Conference on Information and Knowledge
Management, Virginia, USA.
Winkler, W., 2003. Data Cleaning Methods, In the
Conference SIGKDD, Washington DC, USA.
Wang, Y., Storey, V., Firth, C., 1995. A Framework for
Analysis of Data Quality Research, IEEE Transactions
on Knowledge and Data Engineering, Vol. 7, No. 4.
ICEIS 2008 - International Conference on Enterprise Information Systems
478