
The software agents are able to balance work,
setting specialised workers for certain tasks (such as
abnormal data formats recognition and repair).
Moreover, it is intended to learn from the past
experiences, evolving the cooperative work.
Therefore, each event is recorded in a database that,
from time to time, it is mined. The obtained results
will be studied in order to better understand the
abnormal data formats that are appearing and thus,
keeping up-to-date the wrappers and the solvers
knowledge bases.
This platform is a FIPA-compliant, JADE
sustained multi-agent system. It allows the spread of
the agents across multiple platforms, as well as,
supports the agents’ configuration control via a
remote GUI. Moreover it sustains distinct
communicative acts and allows the creation of
ontologies according to the application area.
In the future, new resolution rules for abnormal
data formats identification and resolution will be
integrated. Also, the other two platforms, concerning
the transformation and the loading stages of the
process, will be deployed in a similar way.
REFERENCES
Barnett, V. and Lewis, T., 1994. Outliers in Statistical
Data. John Wiley and Sons.
Bellifemine, F., Poggi, A. and Rimassa, G. 2001.
Developing multi agent systems with a FIPA-
compliant agent framework. In Software - Practice
And Experience, no. 31, pp. 103-128.
Bellifemine, F., Poggi, A. and Rimassa, G. 1999. JADE –
A FIPA-compliant agent framework. CSELT internal
technical report. Part of this report has been also
published in Proceedings of PAAM'99, pp.97-108.
London, United Kingdom.
Bock, R.K. and Krischer, W. 1998. The Data Analysis
Briefbook. Springer.
Doan, A., Domingos, P. and Levy, A. 2001. Reconciling
Schemas of Disparate Data Sources: A Maching-
Learning Approach. In SIGMOD, pp. 509-520.
Fox, C. J., Levitin, A. and Redman, T. 1994. The Notion
of Data and Its Quality Dimensions. Information
Processing and Management 30(1): 9-20.
Guess, F. 2000. Improving Information Quality and
Information Technology Systems in the 21st Century.
Invited talk and paper for the International Conference
Statistics in the 21st Century.
Hernandez, M. A. 1996. A Generalization of Band Joins
and the Merge/Purge Problem. Ph.D. thesis, Columbia
University.
Hernandez, M. A. and Stolfo, S. J. 1998. Real-world data
is dirty: Data Cleansing and the Merge/Purge problem.
Journal of Data Mining and Knowledge Discovery,
2(1):9-37.
Hipp, J., Guntzer, U. and Grimmer, U. 2001. Data Quality
Mining - Making a Virtue of Necessity. In
Proceedings of the 6th ACM SIGMOD Workshop on
Research Issues in Data Mining and Knowledge
Discovery (DMKD 2001), pp. 52-57.
Jennings, N. 2000. On Agent-Based Software
Engineering. Artificial Intelligence, 117 (2) 277-296.
Jeusfeld, M.A., Quix, C. and Jarke, M. 1998. Design and
analysis of quality information for data warehouses. In
Proceedings of the 17th Int. Conf. On Conceptual
Modeling, pp. 349-362. Singapore, China.
Knorr, E. M. and Ng, R. T. 1997. A unified notion of
outliers: Properties and computation. In Proceedings
of the KDD Conference, pp. 219-222.
Lee, M.L., Lu, H., Ling, T. W. and Ko, Y. T. 1999.
Cleansing Data for Mining and Warehousing. In
Proceedings of the 10th International Conference on
Database and Expert Systems Applications (DEXA),
pp. 751-760. Florence, Italy.
Maletic, J. and Marcus, A. 2000. Automated Identification
of Errors in Data Sets. The University of Memphis,
Division of Computer Science, Technical Report.
Marcus, A. and Maletic, J. 2000. Utilizing Association
Rules for the Identification of Errors in Data.
Technical Report TR-14-2000. The University of
Memphis, Division of Computer Science, Memphis.
Miller, R. and Myers, B. 2001. Outlier Finding: Focusing
User Attention on Possible Errors. In the Proceedings
of UIST Conference, pp. 81-90.
Monge, A. 1997. Adaptive detection of approximately
duplicate database records and the database integration
approach to information discovery. Ph.D. Thesis,
University of California, San Diego.
Naumann, F. 2001. From Databases to Information
Systems - Information Quality Makes the Difference.
In Proceedings of the International Conference on
Information Quality.
Pitt, J. and Bellifemine, F. 1999. A Protocol-Based
Semantics for FIPA '97 ACL and its implementation
in JADE. CSELT internal technical report. Part of this
report has been also published in Proceedings of
AI*IA.
Rahm, E. and Do, H. H. 2000. Data Cleaning: Problems
and Current Approaches. Bulletin of the Technical
Committee on Data Engineering, vol. 23, n.4, pp. 3-
13. IEEE Computer Society.
Wooldridge, M. and Ciancarini, P. 2001. Agent-Oriented
Software Engineering: The State of the Art. In Paolo
Ciancarini and Michael Wooldridge (editors), Agent-
Oriented Software Engineering. Springer-Verlag
Lecture Notes in AI Volume 1957.
ATTENUATING THE EFFECT OF DATA ABNORMALITIES ON DATA WAREHOUSES
415