ATTENUATING THE EFFECT OF DATA ABNORMALITIES ON DATA WAREHOUSES

Anália Lourenço, Orlando Belo

2004

Abstract

Today’s informational entanglement makes it crucial to enforce adequate management systems. Data warehousing systems appeared with the specific mission of providing adequate contents for data analysis, ensuring gathering, processing and maintenance of all data elements thought valuable. Data analysis in general, data mining and on-line analytical processing facilities, in particular, can achieve better, sharper results, because data quality is finally taken into account. The available elements must be submitted to an intensive processing before being able to integrate them into the data warehouse. Each data warehousing system embraces extraction, transformation and loading processes which are in charge of all the processing concerning the data preparation towards its integration into the data warehouse. Usually, data is scoped at several stages, inspecting data and schema issues and filtering all those elements that do not comply with the established rules. This paper proposes an agent-based platform, which not only ensures the traditional data flow, but also tries to recover the filtered data when an data error occurs. It is intended to perform the process of error monitoring and control automatically. Bad data is processed and eventually repaired by the agents, integrating it again into the data warehouse’s regular flow. All data processing efforts are registered and afterwards mined in order to establish data error patterns. The obtained results will enrich the wrappers knowledge about abnormal situations’ resolution. Eventually, this evolving will enhance the data warehouse population process, enlarging the integrated volume of data and enriching its actual quality and consistency.

References

  1. Barnett, V. and Lewis, T., 1994. Outliers in Statistical Data. John Wiley and Sons.
  2. Bellifemine, F., Poggi, A. and Rimassa, G. 2001. Developing multi agent systems with a FIPAcompliant agent framework. In Software - Practice And Experience, no. 31, pp. 103-128.
  3. Bellifemine, F., Poggi, A. and Rimassa, G. 1999. JADE - A FIPA-compliant agent framework. CSELT internal technical report. Part of this report has been also published in Proceedings of PAAM'99, pp.97-108. London, United Kingdom.
  4. Bock, R.K. and Krischer, W. 1998. The Data Analysis Briefbook. Springer.
  5. Doan, A., Domingos, P. and Levy, A. 2001. Reconciling Schemas of Disparate Data Sources: A MachingLearning Approach. In SIGMOD, pp. 509-520.
  6. Fox, C. J., Levitin, A. and Redman, T. 1994. The Notion of Data and Its Quality Dimensions. Information Processing and Management 30(1): 9-20.
  7. Guess, F. 2000. Improving Information Quality and Information Technology Systems in the 21st Century. Invited talk and paper for the International Conference Statistics in the 21st Century.
  8. Hernandez, M. A. 1996. A Generalization of Band Joins and the Merge/Purge Problem. Ph.D. thesis, Columbia University.
  9. Hernandez, M. A. and Stolfo, S. J. 1998. Real-world data is dirty: Data Cleansing and the Merge/Purge problem. Journal of Data Mining and Knowledge Discovery, 2(1):9-37.
  10. Hipp, J., Guntzer, U. and Grimmer, U. 2001. Data Quality Mining - Making a Virtue of Necessity. In Proceedings of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2001), pp. 52-57.
  11. Jennings, N. 2000. On Agent-Based Software Engineering. Artificial Intelligence, 117 (2) 277-296.
  12. Jeusfeld, M.A., Quix, C. and Jarke, M. 1998. Design and analysis of quality information for data warehouses. In Proceedings of the 17th Int. Conf. On Conceptual Modeling, pp. 349-362. Singapore, China.
  13. Knorr, E. M. and Ng, R. T. 1997. A unified notion of outliers: Properties and computation. In Proceedings of the KDD Conference, pp. 219-222.
  14. Lee, M.L., Lu, H., Ling, T. W. and Ko, Y. T. 1999. Cleansing Data for Mining and Warehousing. In Proceedings of the 10th International Conference on Database and Expert Systems Applications (DEXA), pp. 751-760. Florence, Italy.
  15. Maletic, J. and Marcus, A. 2000. Automated Identification of Errors in Data Sets. The University of Memphis, Division of Computer Science, Technical Report.
  16. Marcus, A. and Maletic, J. 2000. Utilizing Association Rules for the Identification of Errors in Data. Technical Report TR-14-2000. The University of Memphis, Division of Computer Science, Memphis.
  17. Miller, R. and Myers, B. 2001. Outlier Finding: Focusing User Attention on Possible Errors. In the Proceedings of UIST Conference, pp. 81-90.
  18. Monge, A. 1997. Adaptive detection of approximately duplicate database records and the database integration approach to information discovery. Ph.D. Thesis, University of California, San Diego.
  19. Naumann, F. 2001. From Databases to Information Systems - Information Quality Makes the Difference. In Proceedings of the International Conference on Information Quality.
  20. Pitt, J. and Bellifemine, F. 1999. A Protocol-Based Semantics for FIPA 7897 ACL and its implementation in JADE. CSELT internal technical report. Part of this report has been also published in Proceedings of AI*IA.
  21. Rahm, E. and Do, H. H. 2000. Data Cleaning: Problems and Current Approaches. Bulletin of the Technical Committee on Data Engineering, vol. 23, n.4, pp. 3- 13. IEEE Computer Society.
  22. Wooldridge, M. and Ciancarini, P. 2001. Agent-Oriented Software Engineering: The State of the Art. In Paolo Ciancarini and Michael Wooldridge (editors), AgentOriented Software Engineering. Springer-Verlag Lecture Notes in AI Volume 1957.
Download


Paper Citation


in Harvard Style

Lourenço A. and Belo O. (2004). ATTENUATING THE EFFECT OF DATA ABNORMALITIES ON DATA WAREHOUSES . In Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS, ISBN 972-8865-00-7, pages 411-415. DOI: 10.5220/0002629604110415


in Bibtex Style

@conference{iceis04,
author={Anália Lourenço and Orlando Belo},
title={ATTENUATING THE EFFECT OF DATA ABNORMALITIES ON DATA WAREHOUSES},
booktitle={Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,},
year={2004},
pages={411-415},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002629604110415},
isbn={972-8865-00-7},
}


in EndNote Style

TY - CONF
JO - Proceedings of the Sixth International Conference on Enterprise Information Systems - Volume 1: ICEIS,
TI - ATTENUATING THE EFFECT OF DATA ABNORMALITIES ON DATA WAREHOUSES
SN - 972-8865-00-7
AU - Lourenço A.
AU - Belo O.
PY - 2004
SP - 411
EP - 415
DO - 10.5220/0002629604110415