Authors:
Patrick G. Clark
1
and
Jerzy W. Grzymala-Busse
2
Affiliations:
1
University of Kansas, United States
;
2
University of Kansas and University of Information Technology and Management, United States
Keyword(s):
Data Mining, Rough Set Theory, Probabilistic Approximations, MLEM2 Rule Induction Algorithm, Attribute-concept Values, "Do Not Care" Conditions.
Related
Ontology
Subjects/Areas/Topics:
Artificial Intelligence
;
Biomedical Engineering
;
Business Analytics
;
Cardiovascular Technologies
;
Computing and Telecommunications in Cardiology
;
Data Analytics
;
Data Engineering
;
Data Mining
;
Databases and Information Systems Integration
;
Datamining
;
Decision Support Systems
;
Decision Support Systems, Remote Data Analysis
;
Enterprise Information Systems
;
Health Engineering and Technology Applications
;
Health Information Systems
;
Knowledge Discovery and Information Retrieval
;
Knowledge-Based Systems
;
Sensor Networks
;
Signal Processing
;
Soft Computing
;
Symbolic Systems
Abstract:
In this paper we study the complexity of rule sets induced from incomplete data sets with two interpretations
of missing attribute values: attribute-concept values and “do not care” conditions. Experiments are conducted
on 176 data sets, using three kinds of probabilistic approximations (lower, middle and upper) and the MLEM2
rule induction system. The goal of our research is to determine the interpretation and approximation that
produces the least complex rule sets. In our experiment results, the size of the rule set is smaller for attribute-concept
values for 12 combinations of the type of data set and approximation, for one combination the size
of the rule sets is smaller for “do not care” conditions and for the remaining 11 combinations the difference
in performance is statistically insignificant (5% significance level). The total number of conditions is smaller
for attribute-concept values for ten combinations, for two combinations the total number of conditions is
sma
ller for “do not care” conditions, while for the remaining 12 combinations the difference in performance
is statistically insignificant. Thus, we may claim that attribute-concept values are better than “do not care”
conditions in terms of rule complexity.
(More)