# Complexity of Rule Sets Induced from Incomplete Data with Attribute-concept Values and "Do Not Care" Conditions

### Patrick G. Clark, Jerzy W. Grzymala-Busse

#### Abstract

In this paper we study the complexity of rule sets induced from incomplete data sets with two interpretations of missing attribute values: attribute-concept values and “do not care” conditions. Experiments are conducted on 176 data sets, using three kinds of probabilistic approximations (lower, middle and upper) and the MLEM2 rule induction system. The goal of our research is to determine the interpretation and approximation that produces the least complex rule sets. In our experiment results, the size of the rule set is smaller for attribute-concept values for 12 combinations of the type of data set and approximation, for one combination the size of the rule sets is smaller for “do not care” conditions and for the remaining 11 combinations the difference in performance is statistically insignificant (5% significance level). The total number of conditions is smaller for attribute-concept values for ten combinations, for two combinations the total number of conditions is smaller for “do not care” conditions, while for the remaining 12 combinations the difference in performance is statistically insignificant. Thus, we may claim that attribute-concept values are better than “do not care” conditions in terms of rule complexity.

#### References

- Clark, P. G. and Grzymala-Busse, J. W. (2011). Experiments on probabilistic approximations. In Proceedings of the 2011 IEEE International Conference on Granular Computing, pages 144-149.
- Clark, P. G. and Grzymala-Busse, J. W. (2014). Mining incomplete data with attribute-concept values and “do not care” conditions. In Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems, pages 146-167.
- Grzymala-Busse, J. W. (1991). On the unknown attribute values in learning from examples. In Proceedings of the ISMIS-91, 6th International Symposium on Methodologies for Intelligent Systems, pages 368- 377.
- Grzymala-Busse, J. W. (1992). LERS-a system for learning from examples based on rough sets. In Slowinski, R., editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pages 3-18. Kluwer Academic Publishers, Dordrecht, Boston, London.
- Grzymala-Busse, J. W. (2003). Rough set strategies to data with missing attribute values. In Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining, pages 56-63.
- Grzymala-Busse, J. W. (2004a). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In Proceedings of the Fourth International Conference on Rough Sets and Current Trends in Computing, pages 244-253.
- Grzymala-Busse, J. W. (2004b). Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets, 1:78-95.
- Grzymala-Busse, J. W. (2004c). Three approaches to missing attribute values-a rough set perspective. In Proceedings of the Workshop on Foundation of Data Mining, in conjunction with the Fourth IEEE International Conference on Data Mining, pages 55-62.
- Grzymala-Busse, J. W. (2011). Generalized parameterized approximations. In Proceedings of the RSKT 2011, the 6-th International Conference on Rough Sets and Knowledge Technology, pages 136-145.
- Grzymala-Busse, J. W. and Ziarko, W. (2003). Data mining based on rough sets. In Wang, J., editor, Data Mining: Opportunities and Challenges, pages 142-173. Idea Group Publ., Hershey, PA.
- Kryszkiewicz, M. (1995). Rough set approach to incomplete information systems. In Proceedings of the Second Annual Joint Conference on Information Sciences, pages 194-197.
- Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11:341-356.
- Pawlak, Z. and Skowron, A. (2007). Rough sets: Some extensions. Information Sciences, 177:28-40.
- Pawlak, Z., Wong, S. K. M., and Ziarko, W. (1988). Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies, 29:81- 95.
- Sle¸zak, D. and Ziarko, W. (2005). The investigation of the bayesian rough set model. International Journal of Approximate Reasoning, 40:81-91.
- Stefanowski, J. and Tsoukias, A. (1999). On the extension of rough sets under incomplete information. In Proceedings of the RSFDGrC'1999, 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pages 73-81.
- Wong, S. K. M. and Ziarko, W. (1986). INFER-an adaptive decision support system based on the probabilistic approximate classification. In Proceedings of the 6-th International Workshop on Expert Systems and their Applications, pages 713-726.
- Yao, Y. Y. (2008). Probabilistic rough set approximations. International Journal of Approximate Reasoning, 49:255-271.
- Yao, Y. Y. and Wong, S. K. M. (1992). A decision theoretic framework for approximate concepts. International Journal of Man-Machine Studies, 37:793-809.
- Ziarko, W. (1993). Variable precision rough set model. Journal of Computer and System Sciences, 46(1):39- 59.
- Ziarko, W. (2008). Probabilistic approach to rough sets. International Journal of Approximate Reasoning, 49:272-284.

#### Paper Citation

#### in Harvard Style

Clark P. and Grzymala-Busse J. (2014). **Complexity of Rule Sets Induced from Incomplete Data with Attribute-concept Values and "Do Not Care" Conditions** . In *Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,* ISBN 978-989-758-035-2, pages 56-63. DOI: 10.5220/0005003400560063

#### in Bibtex Style

@conference{data14,

author={Patrick G. Clark and Jerzy W. Grzymala-Busse},

title={Complexity of Rule Sets Induced from Incomplete Data with Attribute-concept Values and "Do Not Care" Conditions},

booktitle={Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,},

year={2014},

pages={56-63},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005003400560063},

isbn={978-989-758-035-2},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of 3rd International Conference on Data Management Technologies and Applications - Volume 1: DATA,

TI - Complexity of Rule Sets Induced from Incomplete Data with Attribute-concept Values and "Do Not Care" Conditions

SN - 978-989-758-035-2

AU - Clark P.

AU - Grzymala-Busse J.

PY - 2014

SP - 56

EP - 63

DO - 10.5220/0005003400560063