Consistency of Incomplete Data

Patrick G. Clark

and Jerzy Grzymala-Busse

1,2

Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, U.S.A.

Institute of Computer Science, Polish Academy of Sciences, 01-237 Warsaw, Poland

Keywords:

Consistency, Incomplete Data, Missing Attribute Values, Rough Set Theory.

Abstract:

In this paper we introduce an idea of consistency for incomplete data sets. Consistency is well-known for

completely speciﬁed data sets, where a data set is consistent if for any two cases with equal all attribute

values, both cases belong to the same concept. We generalize the deﬁnition of consistency for incomplete

data sets using rough set theory. For incomplete data sets there exist three deﬁnitions of consistency. We

discuss two types of missing attribute values: lost values and “do not care” conditions. We illustrate an idea of

consistency for incomplete data sets using experiments on many data sets with missing attribute values derived

from ﬁve benchmark data sets. Results of our paper may be applied for increasing the efﬁciency of mining

incomplete data.

1 INTRODUCTION

A complete data set, i.e., a data set with speciﬁed all

attribute values, is consistent if for any two cases with

the same attribute values, both cases belong to the

same concept (class). Yet another deﬁnition of con-

sistency is based on rough set theory: a complete data

set is consistent if for any concept its lower and up-

per approximations are equal (Pawlak, 1982; Pawlak,

1991). Consistency for incomplete data sets, i.e., data

sets with some missing attribute values, was not de-

ﬁned in the accessible literature.

The main objective of this paper is to study con-

sistency for incomplete data sets using rough set the-

ory. For incomplete data sets there exist three deﬁni-

tions of approximations, called singleton, subset and

concept. Additionally, for complete data sets an idea

of the approximation was generalized by introducing

probabilistic approximations, with an additional pa-

rameter, interpreted as a probability (Grzymala-Busse

and Ziarko, 2003; Pawlak et al., 1988; Yao, 2007;

Yao and Wong, 1992; Yao et al., 1990; Ziarko, 1993;

Ziarko, 2008). Probabilistic approximations were ex-

tended to incomplete data sets by introducing sin-

gleton, subset and concept probabilistic approxima-

tions in (Grzymala-Busse, 2011). First results on ex-

periments on probabilistic approximations were pub-

lished in (Clark and Grzymala-Busse, 2011).

We discuss singleton, subset and concept consis-

tency for incomplete data sets and study their basic

properties. Additionally, we conducted experiments

on ﬁve benchmark data sets converted to many incom-

plete data sets. In incomplete data sets, there are two

kinds of missing attribute values: lost values and “do

not care” conditions (Grzymala-Busse, 2003). Lost

values indicate the original attribute values were not

known and processing this kind of missing attribute

values is conducted by taking into account only exist-

ing, speciﬁed attribute values. On the other hand, “do

not care” conditions indicate an actual value was one

of the attribute values and we assumed that the miss-

ing attribute values may be replaced by any attribute

value.

2 COMPLETE DATA SETS

Our basic assumption is that the data sets are pre-

sented in the form of a decision table. An example

of a decision table is shown in Table 1. Rows of the

decision table represent cases, while columns are la-

beled by variables. The set of all cases is denoted by

U. In Table 1, U = {1, 2, 3, 4, 5, 6, 7, 8}. Some vari-

ables are called attributes while one selected variable

is called a decision and is denoted by d. The set of

all attributes will be denoted by A. In Table 1, A =

{Temperature, Headache, Cough} and d = Flu. For

an attribute a and case x, a(x) denotes the value of the

attribute a for case x. For example, Temperature (1) =

high.

Clark P. and Grzymala-Busse J..

Consistency of Incomplete Data.

DOI: 10.5220/0004490300800087

In Proceedings of the 2nd International Conference on Data Technologies and Applications (DATA-2013), pages 80-87

ISBN: 978-989-8565-67-9

 2013 SCITEPRESS (Science and Technology Publications, Lda.)

Table 1: A complete decision table.

Attributes Decision

Case Temperature Headache Cough Flu

1 high yes yes yes

2 normal yes no yes

3 high no no yes

4 high no no yes

5 high no no no

6 normal yes no no

7 normal yes no no

8 normal no yes no

A signiﬁcant idea used for scrutiny of data sets is

a block of an attribute-value pair. Let (a, v) be an

attribute-value pair. For complete data sets, i.e., data

sets in which every attribute value is speciﬁed, a block

of (a, v), denoted by [(a, v)], is the following set

{x | a(x) = v}. (1)

For Table 1, blocks of all attribute-value pairs are

[(Temperature, normal)] = {2, 6, 7, 8},

[(Temperature, high)] = {1, 3, 4, 5},

[(Headache, yes)] = {1, 2, 6, 7},

[(Headache, no)] = {3, 4, 5, 8},

[(Cough, yes)] = {1, 8},

[(Cough, no)] = {2, 3, 4, 5, 6, 7}.

A special block of a decision-value pair is called

a concept. In Table 1, the concepts are [(Flu, yes)] =

{1, 2, 3, 4} and [(Flu, no)] = {5, 6, 7, 8}.

Let B be a subset of the set A of all attributes.

Complete data sets are characterized by the indis-

cernibility relation IND(B) deﬁned as follows: for

any x, y ∈ U ,

(x, y) ∈ IND(B) if and only if a(x) = a(y)

for any a ∈ B

(2)

Obviously, IND(B) is an equivalence relation.

The equivalence class of IND(B) containing x ∈ U

will be denoted by [x]

and called B-elementary set.

A-elementary sets will be called elementary. We have

[x]

= ∩{[(a, a(x))] | a ∈ B}. (3)

The set of all equivalence classes [x]

, where x ∈

U, is a partition on U denoted by B

∗

. For Table 1, A

∗

= {{1}, {2, 6, 7}, {3, 4, 5}, {8}}. All members of A

∗

are elementary sets.

The data set, presented in Table 1, contains con-

ﬂicting cases, for example cases 2 and 6: for any at-

tribute a ∈ A, a(2) = a(6), yet cases 2 and 6 belong to

two different concepts. A data set containing conﬂict-

ing cases will be called inconsistent. We may recog-

nize that a data set is inconsistent comparing the par-

tition A

∗

with a partition of all concepts: there exist an

elementary set that is not a subset of any concept. For

Table 1, the elementary set {2, 6, 7} is not a subset

of any of the two concepts {1, 2, 3, 4} and {5, 6, 7,

8}. There exists yet another way to recognize incon-

sistency of data sets, based on ideas of B-lower and

B-upper approximations. Let X be a subset of U. The

B-lower approximation of X, denoted by appr

(X),

is deﬁned as follows

{x | x ∈ U, [x]

⊆ X }. (4)

The B-upper approximation of X, denoted by

appr

(X), is deﬁned as follows

{x | x ∈ U, [x]

∩ X 6=

0}. (5)

For the data set from Table 1 and the concept

[(Flu, yes)] = {1, 2, 3, 4} = X,

appr

(X) = {1}

and

appr

(X) = {1, 2, 3, 4, 5, 6, 7}.

A data set is inconsistent if and only if there exists

a concept X for which appr

(X) 6= appr

(X).

3 INCOMPLETE DATA SETS

An example of the incomplete data set is presented in

Table 2. Some attribute values are missing. Such val-

ues are denoted either by ?, denoting a lost value (the

original attribute value was not known, we will try to

use only existing, speciﬁed attribute values) or by ∗,

denoting a “do not care” condition (we are assuming

that the missing attribute value may be replaced by

any attribute value).

For incomplete decision tables the deﬁnition

of a block of an attribute-value pair is modiﬁed

(Grzymala-Busse, 2003; Grzymala-Busse, 2004a;

Grzymala-Busse, 2004b).

• If for an attribute a there exists a case x such that

a(x) = ?, i.e., the corresponding value is lost, then

the case x should not be included in any blocks

[(a, v)] for all values v of attribute a,

• If for an attribute a there exists a case x such that

the corresponding value is a “do not care” condi-

tion, i.e., a(x) = ∗, then the case x should be in-

ConsistencyofIncompleteData

Table 2: An incomplete decision table.

Attributes Decision

Case Temperature Headache Cough Flu

1 ∗ yes yes yes

2 normal ? no yes

3 ? no ∗ yes

4 high no no yes

5 high ∗ ? no

6 normal yes ∗ no

7 normal yes no no

8 normal ? yes no

cluded in blocks [(a, v)] for all speciﬁed values v

of attribute a.

Thus, for Table 2, the blocks of all attribute-value

pairs are

[(Temperature, normal)] = {1, 2, 6, 7, 8},

[(Temperature, high)] = {1, 4, 5},

[(Headache, yes)] = {1, 5, 6, 7},

[(Headache, no)] = {3, 4, 5},

[(Cough, yes)] = {1, 3, 6, 8},

[(Cough, no)] = {2, 3, 4, 6, 7}.

Let B be a subset of the set A of all attributes. For

a case x ∈ U the characteristic set K

(x) is deﬁned as

the intersection of the sets K(x,a), for all a ∈ B, where

the set K(x, a) is deﬁned in the following way:

• If a(x) is speciﬁed, then K(x, a) is the block

[(a, a(x))] of attribute a and its value a(x),

• If a(x) =? or a(x) = ∗ then the set K(x, a) = U.

Characteristic set K

(x) may be interpreted as the

set of cases that are indistinguishable from x using

all attributes from B, with a given interpretation of

missing attribute values. Thus, K

(x) is the set of all

cases that cannot be distinguished from x using all at-

tributes.

For Table 2 and B = A,

(1) = U ∩ {1, 5, 6, 7} ∩ {1, 3, 6, 8} = {1, 6},

(2) = {1, 2, 6, 7, 8} ∩U ∩ {2, 3, 4, 6, 7} = {2, 6, 7},

(3) = U ∩ {3, 4, 5} ∩U = {3, 4, 5},

(4) = {1, 4, 5} ∩ {3, 4, 5} ∩{2, 3, 4, 6, 7} = {4},

(5) = {1, 4, 5} ∩U ∩U = {1, 4, 5},

(6) = {1, 2, 6, 7, 8} ∩ {1,5, 6, 7} ∩U = {1, 6, 7},

(7) = {1, 2, 6, 7, 8} ∩ {1, 5, 6, 7} ∩ {2, 3, 4, 6, 7} =

{6, 7}, and

(8) = {1, 2, 6, 7, 8} ∩U ∩ {1, 3, 6, 8} = {1, 6, 8}.

For incomplete data sets there exist three distinct

deﬁnitions of approximations. Let X be a subset of U.

The B-singleton lower approximation of X, denoted

by appr

singleton

(X), is deﬁned as follows

{x | x ∈ U, K

(x) ⊆ X }. (6)

The singleton lower approximations were stud-

ied in many papers, see, e.g., (Grzymala-Busse,

2003; Grzymala-Busse, 2004b; Kryszkiewicz, 1995;

Kryszkiewicz, 1999; Lin, 1989; Lin, 1992; Slowinski

and Vanderpooten, 2000; Stefanowski and Tsoukias,

1999; Stefanowski and Tsoukias, 2001; Yao, 1998).

The B-singleton upper approximation of X, de-

noted by appr

singleton

(X), is deﬁned as follows

{x | x ∈ U, K

(x) ∩ X 6=

0}. (7)

The singleton upper approximations, like single-

ton lower approximations, were also studied in many

papers, e.g., (Grzymala-Busse, 2003; Grzymala-

Busse, 2004b; Kryszkiewicz, 1995; Kryszkiewicz,

1999; Slowinski and Vanderpooten, 2000; Ste-

fanowski and Tsoukias, 1999; Stefanowski and

Tsoukias, 2001; Yao, 1998).

The B-subset lower approximation of X , denoted

by appr

subset

(X), is deﬁned as follows

∪ {K

(x) | x ∈ U, K

(x) ⊆ X }. (8)

The subset lower approximations were introduced

in (Grzymala-Busse, 2003; Grzymala-Busse, 2004b).

The B-subset upper approximation of X, denoted

by appr

subset

(X), is deﬁned as follows

∪ {K

(x) | x ∈ U, K

(x) ∩ X 6=

0}. (9)

The subset upper approximations were introduced

in (Grzymala-Busse, 2003; Grzymala-Busse, 2004b).

The B-concept lower approximation of X , denoted

by appr

concept

(X), is deﬁned as follows

∪ {K

(x) | x ∈ X, K

(x) ⊆ X }. (10)

The concept lower approximations were intro-

duced in (Grzymala-Busse, 2003; Grzymala-Busse,

2004b).

The B-concept upper approximation of X , de-

noted by appr

concept

(X), is deﬁned as follows

∪ {K

(x) | x ∈ X, K

(x) ∩ X 6=

= ∪ {K

(x) | x ∈ X}.

(11)

The concept upper approximations were studied

in (Grzymala-Busse, 2003; Grzymala-Busse, 2004b;

Lin, 1992).

For Table 2 and X = {1, 2, 3, 4}, all A-singleton,

A-subset and A-concept approximations are:

appr

singleton

(X) = {4},

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

appr

singleton

(X) = {1, 2, 3, 4, 5, 6, 8},

appr

subset

(X) = {4},

appr

subset

(X) = U,

appr

concept

(X) = {4},

appr

concept

(X) = {1, 2, 3, 4, 5, 6, 7}.

0.2

0.4

0.6

0.8

1.2

0 5 10 15 20 25 30 35

Number of distinct approximations

Percentage of missing attribute values

1, subset

2, subset

1, singleton

2, singleton

1, concept

2, concept

Figure 1: Bankruptcy data set with lost values.

0 5 10 15 20 25 30 35

Number of distinct approximations

Percentage of missing attribute values

1, subset

2, subset

1, singleton

2, singleton

1, concept

2, concept

Figure 2: Bankruptcy data set with “do not care” conditions.

4 PROBABILISTIC

APPROXIMATIONS

By analogy with lower and upper approximations de-

ﬁned using characteristic sets, we will introduce three

kinds of probabilistic approximations: singleton, sub-

set and concept. Again, let B be a subset of the at-

tribute set A and X be a subset of U.

A B-singleton probabilistic approximation of X

with the threshold α, 0 < α ≤ 1, denoted by

appr

singleton

α,B

(X), is deﬁned as follows

{x | x ∈ U, Pr(X | K

(x)) ≥ α}, (12)

0 10 20 30 40

Number of distinct approximations

Percentage of missing attribute values

recurrence-events, subset

no-recurrence-events, subset

recurrence-events, singleton

no-recurrence-events, singleton

recurrence-events, concept

no-recurrence-events, concept

Figure 3: Breast cancer data set with lost values.

100

120

140

160

180

200

0 10 20 30 40

Number of distinct approximations

Percentage of missing attribute values

recurrence-events, subset

no-recurrence-events, subset

recurrence-events, singleton

no-recurrence-events, singleton

recurrence-events, concept

no-recurrence-events, concept

Figure 4: Breast cancer data set with “do not care” condi-

tions.

0 5 10 15 20 25 30 35 40

Number of distinct approximations

Percentage of missing attribute values

1, subset

0, subset

1, singleton

0, singleton

1, concept

0, concept

Figure 5: Echocardiogram data set with lost values.

where Pr(X | K

(x)) =

|X ∩ K

(x)|

is the conditional

probability of X given K

(x) and |Y | denotes the car-

dinality of set Y .

A B-subset probabilistic approximation of the set

X with the threshold α, 0 < α ≤ 1, denoted by

appr

subset

α,B

(X), is deﬁned as follows

∪{K

(x) | x ∈ U, Pr(X | K

(x)) ≥ α}. (13)

ConsistencyofIncompleteData

0 5 10 15 20 25 30 35 40

Number of distinct approximations

Percentage of missing attribute values

1, subset

0, subset

1, singleton

0, singleton

1, concept

0, concept

Figure 6: Echocardiogram data set with “do not care” con-

ditions.

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

yes, subset

no, subset

yes, singleton

no, singleton

yes, concept

no, concept

Figure 7: Hepatitis data set with lost values.

100

120

140

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

yes, subset

no, subset

yes, singleton

no, singleton

yes, concept

no, concept

Figure 8: Hepatitis data set with “do not care” conditions.

A B-concept probabilistic approximation of the

set X with the threshold α, 0 < α ≤ 1, denoted by

appr

concept

α,B

(X), is deﬁned as follows

∪{K

(x) | x ∈ X, Pr(X | K

(x)) ≥ α}. (14)

Let type ∈ {singleton, subset, concept}. Note that

appr

type

1,B

(X) = appr

type

(X) (15)

and for the smallest possible positive α (in our exper-

iments such α = 0.001)

appr

type

α,B

(X) = appr

type

(X). (16)

For Table 2, all distinct A-singleton, A-subset and

A-concept approximations of the set X = {1, 2, 3, 4}

are

appr

singleton

0.333,A

(X) = {1, 2, 3, 4, 5, 6, 8}

appr

singleton

0.5,A

(X) = {1, 3, 4, 5},

appr

singleton

0.667,A

(X) = {3, 4, 5},

appr

singleton

1,A

(X) = {4},

appr

subset

0.333,A

(X) = U,

appr

subset

0.5,A

(X) = {1, 3, 4, 5, 6},

appr

subset

0.667,A

(X) = {1, 3, 4, 5},

appr

subset

1,A

(X) = {4},

appr

concept

0.333,A

(X) = {1, 2, 3, 4, 5, 6, 7},

appr

concept

0.5,A

(X) = {1, 3, 4, 5, 6},

appr

concept

0.667,A

(X) = {3, 4, 5},

appr

concept

1,A

(X) = {4}.

5 CONSISTENCY

Let X be a concept of the incomplete data set, i.e., a

block of a decision-value pair. Let B be a subset of the

set A of all attributes. Let type ∈ {singleton, subset,

concept}. The concept X is B-type consistent in the

data set if and only if for any α > 0

appr

type

α,B

(X) = X . (17)

If the concept X is A-type consistent in the data set

then it will be called type consistent.

For completely speciﬁed data sets, if B(X) = X

or B(X) = X then B(X) = X = B(X). An analogous

result is not true for incomplete data sets.

A data set will be called B-type consistent when

for any concept X the number of all distinct B-type

probabilistic approximations of X is equal to one. If

the data set is A-type consistent then it will be called

type consistent.

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

6 EXPERIMENTS

We conducted experiments on eight benchmark data

sets, taken from the University of California at Irvine

Machine learning Repository, see Table 3. For any

data set, a family of new data sets was derived, with

randomly placed lost values, starting from 0%, with

the number of lost values gradually increasing, with

the increment of 5%, until in the process of adding

lost values, for some case, all attribute values were

missing. If so, we conducted an additional two ran-

dom attempts and, if there was still a case with miss-

ing all attribute values, the process of deriving new

data sets with lost values was terminated. For any

derived data set with lost values we created a corre-

sponding data set with “do not care” conditions by

replacing all lost values by “do not care” conditions.

Our objective was to test, for any data set from such

set family, how many distinct singleton, subset and

concept probabilistic approximations may be created

when the α parameter changes from 0.001 to 1.

For any data set, the number of distinct single-

ton, subset and concept probabilistic approximations

for all concepts were recorded, see Figures 1–14. On

these ﬁgures the numbers of distinct singleton, subset

and concept approximations are shown for all con-

cepts. For example, on Figure 1, “1, subset” means

the concept labeled in the data set by “1” combined

with the subset probabilistic approximation. It is clear

that for data sets with “do not care” conditions all

three numbers were larger than for corresponding data

sets with lost values.

All eight data sets created from the bankruptcy

data, with percentage of lost values starting at 0%

and ending with 35%, are singleton consistent (and

hence subset and concept consistent). Additionally,

the bankruptcy data set with 0% and 5% of “do not

care” conditions are also singleton consistent. On the

other hand, all 19 data sets created from the breast

cancer data set, i.e., the original data set, 9 data sets

with lost values and 9 data sets with “do not care”

conditions are singleton inconsistent.

All data sets with up to 35% of lost values, derived

from the echocardiogram data set, are singleton con-

sistent. The data set derived from the echocardiogram

data set, with 40% of lost values, is an example of the

data set with one concept (labeled by “1”) that is con-

cept consistent but not singleton consistent. Four data

sets derived from the echocardiogram data set, with

0, 5, 10 and 15% of “do not care” conditions, are sin-

gleton consistent.

For the data sets derived from hepatitis data set,

both data sets with lost values and “do not care” con-

ditions with 5% of missing attribute values are sin-

Table 3: Data sets used for experiments.

Data set Number of

cases attributes concepts

Bankruptcy 66 5 2

Breast cancer 277 9 2

Echocardiogram 74 7 2

Hepatitis 155 19 2

Wine recognition 178 13 3

gleton consistent. Moreover, for the concept “yes”,

data sets with lost values derived from the hepatitis

data set, up to 60%, are concept consistent. For data

sets derived from the wine recognition data set, for

the concept “3” with up to 5% of lost values and with

up to 5% of “do not care” conditions, are singleton

consistent.

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, singleton

3, singleton

2, singleton

Figure 9: Singleton probabilistic approximations for wine

recognition data set with lost values.

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, subset

3, subset

2, subset

Figure 10: Subset probabilistic approximations for wine

recognition data set with lost values.

ConsistencyofIncompleteData

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, concept

3, concept

2, concept

Figure 11: Concept probabilistic approximations for wine

recognition data set with lost values.

100

120

140

160

180

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, singleton

3, singleton

2, singleton

Figure 12: Singleton probabilistic approximations for wine

recognition data set with “do not care” conditions.

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, subset

3, subset

2, subset

Figure 13: Subset probabilistic approximations for wine

recognition data set with “do not care” conditions.

7 CONCLUSIONS

In this paper we introduced the idea of consistency

for incomplete data sets. Consistency is deﬁned for

a single concept. For a given type of approximation,

0 10 20 30 40 50 60

Number of distinct approximations

Percentage of missing attribute values

1, concept

3, concept

2, concept

Figure 14: Concept probabilistic approximations for wine

recognition data set with “do not care” conditions.

it is possible that some concepts are consistent while

other concepts are not consistent. Consistency of the

concept depends on type of approximation. A con-

cept may be subset and concept consistent while it is

singleton inconsistent.

Results of our experiments show that for a given

data set (or a concept) there exist more consistent data

sets, of all types, for data sets affected by lost values

than for data sets affected by “do not care” conditions.

For some data sets all derived incomplete data sets,

for all concepts, are singleton consistent, for some

data sets only some concepts are singleton consistent,

while for some data sets all concepts are not singleton

consistent.

REFERENCES

Clark, P. G. and Grzymala-Busse, J. W. (2011). Experi-

ments on probabilistic approximations. In Proceed-

ings of the 2011 IEEE International Conference on

Granular Computing, pages 144–149.

Grzymala-Busse, J. W. (2003). Rough set strategies to data

with missing attribute values. In Workshop Notes,

Foundations and New Directions of Data Mining, in

conjunction with the 3-rd International Conference on

Data Mining, pages 56–63.

Grzymala-Busse, J. W. (2004a). Characteristic relations for

incomplete data: A generalization of the indiscerni-

bility relation. In Proceedings of the Fourth Interna-

tional Conference on Rough Sets and Current Trends

in Computing, pages 244–253.

Grzymala-Busse, J. W. (2004b). Data with missing attribute

values: Generalization of indiscernibility relation and

rule induction. Transactions on Rough Sets, 1:78–95.

Grzymala-Busse, J. W. (2011). Generalized parameterized

approximations. In Proceedings of the RSKT 2011,

the 6-th International Conference on Rough Sets and

Knowledge Technology, pages 136–145.

Grzymala-Busse, J. W. and Ziarko, W. (2003). Data mining

based on rough sets. In Wang, J., editor, Data Mining:

DATA2013-2ndInternationalConferenceonDataManagementTechnologiesandApplications

Opportunities and Challenges, pages 142–173. Idea

Group Publ., Hershey, PA.

Kryszkiewicz, M. (1995). Rough set approach to incom-

plete information systems. In Proceedings of the

Second Annual Joint Conference on Information Sci-

ences, pages 194–197.

Kryszkiewicz, M. (1999). Rules in incomplete information

systems. Information Sciences, 113(3-4):271–292.

Lin, T. Y. (1989). Neighborhood systems and approxima-

tion in database and knowledge base systems. In Pro-

ceedings of the ISMIS-89, the Fourth International

Symposium on Methodologies of Intelligent Systems,

pages 75–86.

Lin, T. Y. (1992). Topological and fuzzy rough sets. In

Slowinski, R., editor, Intelligent Decision Support.

Handbook of Applications and Advances of the Rough

Sets Theory, pages 287–304. Kluwer Academic Pub-

lishers, Dordrecht, Boston, London.

Pawlak, Z. (1982). Rough sets. International Journal of

Computer and Information Sciences, 11:341–356.

Pawlak, Z. (1991). Rough Sets. Theoretical Aspects of Rea-

soning about Data. Kluwer Academic Publishers,

Dordrecht, Boston, London.

Pawlak, Z., Wong, S. K. M., and Ziarko, W. (1988). Rough

sets: probabilistic versus deterministic approach. In-

ternational Journal of Man-Machine Studies, 29:81–

95.

Slowinski, R. and Vanderpooten, D. (2000). A generalized

deﬁnition of rough approximations based on similar-

ity. IEEE Transactions on Knowledge and Data Engi-

neering, 12:331–336.

Stefanowski, J. and Tsoukias, A. (1999). On the exten-

sion of rough sets under incomplete information. In

Proceedings of the RSFDGrC’1999, 7th International

Workshop on New Directions in Rough Sets, Data

Mining, and Granular-Soft Computing, pages 73–81.

Stefanowski, J. and Tsoukias, A. (2001). Incomplete infor-

mation tables and rough classiﬁcation. Computational

Intelligence, 17(3):545–566.

Yao, Y. Y. (1998). Relational interpretations of neighbor-

hood operators and rough set approximation opera-

tors. Information Sciences, 111:239–259.

Yao, Y. Y. (2007). Decision-theoretic rough set models. In

Proceedings of the Second International Conference

on Rough Sets and Knowledge Technology, pages 1–

12.

Yao, Y. Y. and Wong, S. K. M. (1992). A decision theoretic

framework for approximate concepts. International

Journal of Man-Machine Studies, 37:793–809.

Yao, Y. Y., Wong, S. K. M., and Lingras, P. (1990). A

decision-theoretic rough set model. In Proceedings

of the 5th International Symposium on Methodologies

for Intelligent Systems, pages 388–395.

Ziarko, W. (1993). Variable precision rough set model.

Journal of Computer and System Sciences, 46(1):39–

59.

Ziarko, W. (2008). Probabilistic approach to rough

sets. International Journal of Approximate Reason-

ing, 49:272–284.

ConsistencyofIncompleteData