Authors:
Jerzy W. Grzymala-Busse
1
and
Witold J. Grzymala-Busse
2
Affiliations:
1
University of Kansas, United States
;
2
Touchnet Information Systems, Inc., United States
Keyword(s):
Rough set theory, rule induction, MLEM2 algorithm, missing attribute values, lost values, attribute-concept values, ”do not care” conditions.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Computational Intelligence
;
Evolutionary Computing
;
Knowledge Acquisition
;
Knowledge Discovery and Information Retrieval
;
Knowledge Engineering and Ontology Development
;
Knowledge-Based Systems
;
Machine Learning
;
Soft Computing
;
Symbolic Systems
Abstract:
This paper presents a new methodology to improve the quality of rule sets. We performed a series of data mining experiments on completely specified data sets. In these experiments we removed some specified attribute values, or, in different words, replaced such specified values by symbols of missing attribute values, and used these data for rule induction while original, complete data sets were used for testing. In our experiments we used the MLEM2 rule induction algorithm of the LERS data mining system, based on rough sets. Our approach to missing attribute values was based on rough set theory as well. Results of our experiments show that for some data sets and some interpretation of missing attribute values, the error rate was smaller than for the original, complete data sets. Thus, rule sets induced from some data sets may be improved by increasing incompleteness of data sets. It appears that by removing some attribute values, the rule induction system, forced to induce rules from
remaining information, may induce better rule sets.
(More)