V (6, Skills) = {low, high}, V (7, Education) =
{elementary, secondary} and V (8, Experience) =
{low, high}.
For the data set from Table 1 the blocks of
attribute-value pairs are:
[(Education, elementary)] = {2, 5, 7, 8},
[(Education, secondary)] = {2, 3, 6, 7},
[(Education, higher)] = {1, 2, 4},
[(Skills, low)] = {4, 6, 7, 8},
[(Skills, high)] = {1, 2, 3, 4, 5, 6, 8},
[(Experience, low)] = {1, 2, 5, 8},
[(Experience, high)] = {1, 3, 4, 6, 7, 8}.
For a case x ∈ U and B ⊆ A, the characteristic set
K
B
(x) is defined as the intersection of the sets K(x, a),
for all a ∈ B, where the set K(x, a) is defined in the
following way:
• If a(x) is specified, then K(x, a) is the block
[(a, a(x))] of attribute a and its value a(x),
• If a(x) = −, then the corresponding set K(x, a)
is equal to the union of all blocks of attribute-
value pairs (a, v), where v ∈ V (x, a) if V (x, a) is
nonempty. If V (x, a) is empty, K(x, a) = U,
• If a(x) = ∗ then the set K(x, a) = U, where U is
the set of all cases.
For Table 1 and B = A,
K
A
(1) = {1, 2, 4},
K
A
(2) = {1, 2, 5, 8},
K
A
(3) = {3, 6},
K
A
(4) = {1, 4},
K
A
(5) = {2, 5, 8},
K
A
(6) = {3, 6, 7},
K
A
(7) = {6, 7, 8},
K
A
(8) = {2, 5, 7, 8}.
Note that for incomplete data there are a few
possible ways to define approximations (Grzymala-
Busse, 2003), we used concept approximations
(Grzymala-Busse, 2011) since our previous experi-
ments indicated that such approximations are most ef-
ficient (Grzymala-Busse, 2011). A B-concept lower
approximation of the concept X is defined as follows:
BX = ∪{K
B
(x) | x ∈ X, K
B
(x) ⊆ X },
while a B-concept upper approximation of the con-
cept X is defined by:
BX = ∪{K
B
(x) | x ∈ X, K
B
(x) ∩ X ̸=
/
0} =
= ∪{K
B
(x) | x ∈ X}.
For Table 1, A-concept lower and A-concept upper
approximations of the concept {5, 6, 7, 8} are:
A{5, 6, 7, 8} = {6, 7, 8},
A{5, 6, 7, 8} = {2, 3, 5, 6, 7, 8}.
3 PROBABILISTIC
APPROXIMATIONS
For completely specified data sets a probabilistic ap-
proximation is defined as follows
appr
α
(X) = ∪{[x] | x ∈ U, P(X | [x]) ≥ α},
α is a parameter, 0 < α ≤ 1, see (Grzymala-Busse,
2011; Grzymala-Busse and Ziarko, 2003; Pawlak
et al., 1988; Wong and Ziarko, 1986; Yao, 2008;
Ziarko, 1993). Additionally, for simplicity, the ele-
mentary sets [x]
A
are denoted by [x]. For discussion
on how this definition is related to the variable preci-
sion asymmetric rough sets see (Clark and Grzymala-
Busse, 2011; Grzymala-Busse, 2011).
Note that if α = 1, the probabilistic approximation
becomes the standard lower approximation and if α is
small, close to 0, in our experiments it is 0.001, the
same definition describes the standard upper approxi-
mation.
For incomplete data sets, a B-concept probabilis-
tic approximation is defined by the following formula
(Grzymala-Busse, 2011)
∪{K
B
(x) | x ∈ X, Pr(X|K
B
(x)) ≥ α}.
For simplicity, we will denote K
A
(x) by K(x)
and the A-concept probabilistic approximation will be
called a probabilistic approximation.
For Table 1 and the concept X = [(Productivity,
low)] = {5, 6, 7, 8}, there exist three distinct three
distinct probabilistic approximations:
appr
1.0
(
{
5
,
6
,
7
,
8
}
) =
{
6
,
7
,
8
}
,
appr
0.75
({5, 6, 7, 8}) = {2, 5, 6, 7, 8},
and
appr
0.001
({5, 6, 7, 8}) = {2, 3, 5, 6, 7, 8}.
The special probabilistic approximations with the
parameter α = 0.5 will be called a middle approxima-
tion.
4 EXPERIMENTS
Our experiments are based on eight data sets available
from the University of California at Irvine Machine
Learning Repository, see Table 2.
For every data set a set of templates is created
by incrementally replacing a percentage of existing
specified attribute values (at a 5% increment) with
attribute-concept values. Thus, we started each se-
ries of experiments with no attribute-concept values,
DATA2014-3rdInternationalConferenceonDataManagementTechnologiesandApplications
58