# Consistency of Incomplete Data

### Patrick G. Clark, Jerzy Grzymala-Busse

#### Abstract

In this paper we introduce an idea of consistency for incomplete data sets. Consistency is well-known for completely specified data sets, where a data set is consistent if for any two cases with equal all attribute values, both cases belong to the same concept. We generalize the definition of consistency for incomplete data sets using rough set theory. For incomplete data sets there exist three definitions of consistency. We discuss two types of missing attribute values: lost values and ``do not care'' conditions. We illustrate an idea of consistency for incomplete data sets using experiments on many data sets with missing attribute values derived from five benchmark data sets. Results of our paper may be applied for increasing the efficiency of mining incomplete data.

#### References

- Clark, P. G. and Grzymala-Busse, J. W. (2011). Experiments on probabilistic approximations. In Proceedings of the 2011 IEEE International Conference on Granular Computing, pages 144-149.
- Grzymala-Busse, J. W. (2003). Rough set strategies to data with missing attribute values. In Workshop Notes, Foundations and New Directions of Data Mining, in conjunction with the 3-rd International Conference on Data Mining, pages 56-63.
- Grzymala-Busse, J. W. (2004a). Characteristic relations for incomplete data: A generalization of the indiscernibility relation. In Proceedings of the Fourth International Conference on Rough Sets and Current Trends in Computing, pages 244-253.
- Grzymala-Busse, J. W. (2004b). Data with missing attribute values: Generalization of indiscernibility relation and rule induction. Transactions on Rough Sets, 1:78-95.
- Grzymala-Busse, J. W. (2011). Generalized parameterized approximations. In Proceedings of the RSKT 2011, the 6-th International Conference on Rough Sets and Knowledge Technology, pages 136-145.
- Grzymala-Busse, J. W. and Ziarko, W. (2003). Data mining based on rough sets. In Wang, J., editor, Data Mining: Opportunities and Challenges, pages 142-173. Idea Group Publ., Hershey, PA.
- Kryszkiewicz, M. (1995). Rough set approach to incomplete information systems. In Proceedings of the Second Annual Joint Conference on Information Sciences, pages 194-197.
- Kryszkiewicz, M. (1999). Rules in incomplete information systems. Information Sciences, 113(3-4):271-292.
- Lin, T. Y. (1989). Neighborhood systems and approximation in database and knowledge base systems. In Proceedings of the ISMIS-89, the Fourth International Symposium on Methodologies of Intelligent Systems, pages 75-86.
- Lin, T. Y. (1992). Topological and fuzzy rough sets. In Slowinski, R., editor, Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, pages 287-304. Kluwer Academic Publishers, Dordrecht, Boston, London.
- Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11:341-356.
- Pawlak, Z. (1991). Rough Sets. Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht, Boston, London.
- Pawlak, Z., Wong, S. K. M., and Ziarko, W. (1988). Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies, 29:81- 95.
- Slowinski, R. and Vanderpooten, D. (2000). A generalized definition of rough approximations based on similarity. IEEE Transactions on Knowledge and Data Engineering, 12:331-336.
- Stefanowski, J. and Tsoukias, A. (1999). On the extension of rough sets under incomplete information. In Proceedings of the RSFDGrC'1999, 7th International Workshop on New Directions in Rough Sets, Data Mining, and Granular-Soft Computing, pages 73-81.
- Stefanowski, J. and Tsoukias, A. (2001). Incomplete information tables and rough classification. Computational Intelligence, 17(3):545-566.
- Yao, Y. Y. (1998). Relational interpretations of neighborhood operators and rough set approximation operators. Information Sciences, 111:239-259.
- Yao, Y. Y. (2007). Decision-theoretic rough set models. In Proceedings of the Second International Conference on Rough Sets and Knowledge Technology, pages 1- 12.
- Yao, Y. Y. and Wong, S. K. M. (1992). A decision theoretic framework for approximate concepts. International Journal of Man-Machine Studies, 37:793-809.
- Yao, Y. Y., Wong, S. K. M., and Lingras, P. (1990). A decision-theoretic rough set model. In Proceedings of the 5th International Symposium on Methodologies for Intelligent Systems, pages 388-395.
- Ziarko, W. (1993). Variable precision rough set model. Journal of Computer and System Sciences, 46(1):39- 59.
- Ziarko, W. (2008). Probabilistic approach to rough sets. International Journal of Approximate Reasoning, 49:272-284.

#### Paper Citation

#### in Harvard Style

Clark P. and Grzymala-Busse J. (2013). **Consistency of Incomplete Data** . In *Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,* ISBN 978-989-8565-67-9, pages 80-87. DOI: 10.5220/0004490300800087

#### in Bibtex Style

@conference{data13,

author={Patrick G. Clark and Jerzy Grzymala-Busse},

title={Consistency of Incomplete Data},

booktitle={Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,},

year={2013},

pages={80-87},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0004490300800087},

isbn={978-989-8565-67-9},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the 2nd International Conference on Data Technologies and Applications - Volume 1: DATA,

TI - Consistency of Incomplete Data

SN - 978-989-8565-67-9

AU - Clark P.

AU - Grzymala-Busse J.

PY - 2013

SP - 80

EP - 87

DO - 10.5220/0004490300800087