‘Misclassiﬁcation Error’ Greedy Heuristic to Construct Decision Trees

for Inconsistent Decision Tables

Mohammad Azad and Mikhail Moshkov

Computer, Electrical & Mathematical Sciences & Engineering Division,

King Abdullah University of Science and Technology,

Thuwal 23955-6900, Saudi Arabia

Keywords:

Optimization, Decision Trees, Dynamic Programming, Greedy Heuristics, Many-valued Decisions.

Abstract:

A greedy algorithm has been presented in this paper to construct decision trees for three different approaches

(many-valued decision, most common decision, and generalized decision) in order to handle the inconsistency

of multiple decisions in a decision table. In this algorithm, a greedy heuristic ‘misclassiﬁcation error’ is

used which performs faster, and for some cost function, results are better than ‘number of boundary subtables’

heuristic in literature. Therefore, it can be used in the case of larger data sets and does not require huge amount

of memory. Experimental results of depth, average depth and number of nodes of decision trees constructed

by this algorithm are compared in the framework of each of the three approaches.

1 INTRODUCTION

There are groups of rows (objects) with equal val-

ues of conditional attributes but with different de-

cisions (values of the decision attribute) in incon-

sistent decision tables (IDT). It is common to have

such tables in our real life because we do not have

enough number of attributes of the domain to sep-

arate rows. Furthermore, it is natural to have such

data sets in optimization problems such as ﬁnding a

Hamiltonian circuit with the minimum length in trav-

eling salesman problem, ﬁnding nearest post ofﬁce

(Moshkov and Zielosko, 2011). It also arises when

we study, e.g., problem of semantic annotation of

images (Boutell et al., 2004), music categorization

into emotions (Wieczorkowska et al., 2005), func-

tional genomics (Blockeel et al., 2006), text catego-

rization (Zhou et al., 2005) etc.

Table 1 presents the ‘Play Tennis’ example

(Mitchell, 1997) where the conditional attributes de-

scribe the condition of the environment and the de-

cision attribute refers whether one can play tennis or

not (Note, r

refers observation or row i). Here, r

, r

and r

have the same values of conditional attributes

but different decisions. Such rows are highlighted in

blue color. Similar situation is for r

, and r

that

have been colored with red. This type of inconsis-

tency can happen because of missing attribute to sep-

arate objects. In the paper (Azad et al., 2013), three

Table 1: Example of ‘Play Tennis’ inconsistent decision ta-

ble.

Outlook Humidity Wind Play Tennis

Sunny High Weak No

Sunny High Strong No

Overcast High Weak Yes

Rain High Weak Yes

Rain Normal Weak Yes

Rain Normal Strong No

Overcast Normal Weak Yes

Sunny High Weak Yes

Sunny Normal Weak Yes

Rain Normal Strong Yes

Sunny Normal Strong Yes

Overcast High Strong Yes

Overcast Normal Weak Yes

Rain High Strong No

Sunny High Weak Yes

approaches are considered to deal with inconsistent

decision tables.

The ﬁrst approach is called many-valued decisions

– MVD. Instead of a group of equal rows with dif-

ferent decisions, just one row is kept with the same

values of conditional attributes and a set containing

all decisions for rows from the group (Moshkov and

Zielosko, 2011) is attached with the row. The second

approach is called the most common decision – MCD.

Instead of a group of equal rows with different deci-

184

Azad M. and Moshkov M..

‘Misclassiﬁcation Error’ Greedy Heuristic to Construct Decision Trees for Inconsistent Decision Tables.

DOI: 10.5220/0005059201840191

In Proceedings of the International Conference on Knowledge Discovery and Information Retrieval (KDIR-2014), pages 184-191

ISBN: 978-989-758-048-2

 2014 SCITEPRESS (Science and Technology Publications, Lda.)

sions, just one row is kept with the same values of

conditional attributes and a single decision attached

with this row that have the value of the most common

decision for rows from the group. The third approach

is well known in the rough set theory (Pawlak, 1991;

Skowron and Rauszer, 1992) and is called generalized

decision – GD approach. In this case, an inconsistent

decision table is transformed into the table with many-

valued decisions and after that each set of decisions is

encoded by a number (decision) such that equal sets

are encoded by equal numbers and different sets are

encoded by different numbers.

In literature, often, problems that are connected

with multi-label data are considered for classiﬁca-

tion (multi-label classiﬁcation problem) (Clare and

King, 2001; Comit´e et al., 2003; Loza Menc´ıa and

F¨urnkranz, 2008; Tsoumakas and Katakis, 2007;

Tsoumakas et al., 2010; Zhou et al., 2012). How-

ever, in this paper the aim is to study decision trees

for multi-label decision tables for knowledge repre-

sentation. Furthermore, decision tree has been used as

the model to represent such knowledge and the goal

is to compare the decision tree structure for different

heuristics and different approaches of representation

of inconsistent decision table.

In this paper, a greedy heuristic ‘misclassiﬁca-

tion error’ has been introduced for decision table with

many-valued decisions. Its performance has been

compared with the heuristic ‘number of boundary

subtables’ from the paper (Azad et al., 2013). Data

sets have been used from UCI ML repository as well

as KEEL repository (Alcal-Fdez et al., 2009). The

advantages of the KEEL data sets are twofolds: (1)

they have big amount of data, and (2) they are real

life examples of decision tables with many-valued de-

cisions. The heuristic ‘numberof boundarysubtables’

is complex in terms of time, and memory require-

ment. Hence, it is essential to design new heuristic

which gives equal or better results but with less time

complexity, and memory requirement. At the end, re-

sults have been presented which show that the use of

MCD and, especially, MVD approaches can reduce

the complexity of trees in comparison with GD ap-

proach. The goal is to ﬁnd all decisions for the case

GD whereas a ﬁxed decision for the case of MCD and

arbitrary decision for the case of MVD for a particu-

lar row. That means we are moving from highly re-

stricted decision constraint to less restricted decision

constraint. Hence we usually get less complex tree

for the last case than others. This comparison is cru-

cial for knowledge representation since we can get the

useful knowledge in the form of less complicated de-

cision trees.

This paper consists of ﬁve sections. Section 2 con-

tains main deﬁnitions. In Sect. 3, the greedy algo-

rithm for construction of decision trees is presented.

Section 4 contains results of experiments and Sect. 5

concludes the paper.

2 MAIN DEFINITIONS

A decision table is a rectangular table T ﬁlled by

non-negative integers. Columns of this table are la-

beled with conditional attributes f

, . . . , f

. If we have

strings as values of attributes, we have to encode the

values as nonnegativeintegers. In addition, if we have

real valued data, we have to discretize the value to use

in this format. Rows of the table are pairwise differ-

ent, and each row is labeled with a natural number

(decision) which is interpreted as a value of the de-

cision attribute. To differentiate with decision table

with many-valued decisions, we sometimes call it de-

cision table with one-valued decisions.

It is possible that T is inconsistent, i.e., contains

equal rows with different decisions. The table T can

contain also equal rows with equal decisions. The

most frequent decision attached to rows from a group

of rows in a decision table T with one-valued decision

is called the most common decision for this group of

rows. For approach called most common decision –

MCD, we transform inconsistent decision table T into

consistent decision table T

MCD

with one-valued deci-

sion. Instead of a group of equal rows with different

decisions, we consider one row from the group and

we attach to this row the most common decision for

the considered group of rows.

For approach called generalized decision – GD,

we transform inconsistent decision table T into con-

sistent decision table T

with one-valued decisions.

Instead of a group of equal rows with different de-

cisions, we consider one row from the group and we

attach to this row the set of all decisions for rows from

the group. Then instead of a set of decisions we attach

to each row a code of this set – a natural number such

that the codes of equal sets are equal and the codes of

different sets are different.

For approach called many-valued decisions –

MVD, we transform an inconsistent decision table T

into a decision table T

MVD

with many-valued deci-

sions. Instead of a group of equal rows with different

decisions, we consider one row from the group and

we attach to this row the set of all decisions for rows

from the group (Moshkov and Zielosko, 2011).

Note that each decision table with one-valued de-

cisions can be interpreted also as a decision table with

many-valued decisions. In such table, each row is la-

beled with a set of decisions which has one element.

'MisclassificationError'GreedyHeuristictoConstructDecisionTreesforInconsistentDecisionTables

185

Table 2: Transformation of inconsistent decision table T

into decision tables T

MVD

, T

and T

MCD

0 0 0 1

0 1 1 1

0 1 1 2

1 0 1 1

1 0 1 3

1 1 0 2

1 1 0 3

0 0 1 2

MVD

0 0 0 {1}

0 1 1 {1, 2}

1 0 1 {1, 3}

1 1 0 {2, 3}

0 0 1 {2}

0 0 0 1

0 1 1 2

1 0 1 3

1 1 0 4

0 0 1 5

MCD

0 0 0 1

0 1 1 1

1 0 1 1

1 1 0 2

0 0 1 2

We have shown in Table 2 the transformation of an

inconsistent decision table T

using all the three ap-

proaches.

We denote row i by r

where i = 1, . . . , N(T). For

example, r

means the ﬁrst row, r

means the second

row and so on. We denote the number of rows in the

table T by N(T).

If there is a decision which belongs to the set of

decisions attached to each row of T, then we call it

a common decision for T. We will say that T is a

degenerate table if T does not have rows or it has a

common decision. For example, T

′

MVD

is degenerate

table as shownin Table 3, where the commondecision

is 1.

Table 3: A degenerate many-valued decision table, T

′

MVD

0 0 0 {1}

0 1 1 {1,2}

1 0 1 {1,3}

A table obtained from T by removing some rows

is called a subtable of T. There is a special type of

subtable called boundary subtable. The subtable T

′

of T is a boundary subtable of T if and only if T

′

not degenerate but each of its proper subtable is de-

generate. We denote the number of boundary subta-

bles of the table T by nBS(T). It is clear that T is a

degenerate table if and only if nBS(T) = 0. The value

nBS(T) has been considered as greedy heuristic of T

in (Azad et al., 2013). Below is an example of all

boundary subtables of T

MVD

0 1 1 {1, 2}

1 0 1 {1, 3}

1 1 0 {2, 3}

0 0 0 {1}

1 1 0 {2, 3}

1 0 1 {1, 3}

0 0 1 {2}

0 0 0 {1}

0 0 1 {2}

We denote the subtable of T which consists of

rows that have values a

, . . . , a

at the intersection

with columns f

, . . . , f

by T( f

, a

). . . ( f

, a

Such nonempty subtables (including the table T) are

called separable subtables of T. For example, (see

Table 4) if we consider subtable T

MVD

( f

, 0) for table

MVD

(see Table 2), it will consist of rows 1, 2, and 5.

Similarly, T

MVD

( f

, 0)( f

, 0) subtable will consist of

rows 1, and 5.

Table 4: Example of subtables of a decision table with

many-valued decisions T

MVD

( f

, 0) =

0 0 0 {1}

0 1 1 {1,2}

0 0 1 {2}

MVD

( f

, 0)( f

, 0) =

0 0 0 {1}

0 0 1 {2}

The set of attributes (columns of table T) which

have different values i.e. not constant, is denoted by

E(T). For example, for the table T

MVD

, E(T

MVD

) =

{ f

, f

}. Similarly, E(T

MVD

( f

, 0)) = { f

, f

} for

the subtable T

MVD

( f

, 0), because the value for the

attribute f

is constant (=0) in subtable T

MVD

( f

, 0).

The set of values from the column f

is denoted by

E(T, f

) where f

∈ E(T). For example, if we con-

sider table T

MVD

and attribute f

, then E(T

MVD

, f

) =

{0, 1}.

The most common decision for a table T with

many-valued decisions is the decision which belongs

to the maximum number of sets of decisions attached

to rows of the table T. If we have more than one such

decision, we choose the minimum one. We denote the

number of rows for which the set of decisions con-

tains the most common decision by N

mcd

(T). We de-

note the Misclassiﬁcation error for a table T which is

the difference between total number of rows and num-

ber of rows with the most common decision in the

set of decisions in the table T by M(T), i.e., M(T) =

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

186

N(T) − N

mcd

(T). For example, if we look at the ta-

ble T

MVD

, the decisions 1, 2, 3, appear 3, 3, 2 times, re-

spectively. The minimum decision that appears most

of the time in the table T

MVD

is 1. Therefore, 1 is

the most common decision in T

MVD

. So, for the table

MVD

, the number of rows N(T

MVD

) is 5, number of

rows with most common decision N

mcd

MVD

) is 3,

and misclassiﬁcation error is 5− 3 = 2.

A decision tree over T is a ﬁnite tree with root

in which each terminal node is labeled with a deci-

sion (a natural number), and each nonterminal node

is labeled with an attribute from the set { f

, . . . , f

A number of edges start from each nonterminal node

which are labeled with different non-negativeintegers

(e.g. two edges labeled with 0 and 1 if the nontermi-

nal node is labeled with binary attribute).

Let Γ be a decision tree over T and v be a node

of Γ. There is one to one mapping between node

v and subtable of T i.e. for each node v, there is

a unique subtable of T. This subtable is deﬁned as

T(v) corresponding to the table T and node v. If

node v is the root of Γ then T(v) = T i.e. the sub-

table T(v) is the same as T. Otherwise, T(v) is the

subtable T( f

, δ

). . . ( f

, δ

) of the table T where

attributes f

, . . . , f

and numbers δ

, . . . , δ

are re-

spectively node and edge labels in the path from the

root to node v.

1 2

Figure 1: Decision tree for T

MVD

We will say that Γ is a decision tree for T, if for

any node v of Γ:

• if T(v) is degenerate then v is labeled with the

common decision for T(v),

• if T(v) is not degenerate then v is labeled with

an attribute f

∈ E(T(v)), and if E(T(v), f

) =

, . . . , a

}, then k outgoing edges from node v

are labeled with a

, . . . , a

An example of a decision tree for the table T

MVD

can be found in Fig. 1. If v is the node labeled with the

attribute f

, then subtable T(v) corresponding to the

node v will be the subtable T( f

, 0) of table T. Sim-

ilarly, the subtable corresponding to the node labeled

with 2 will be T( f

, 0)( f

, 1).

The depth of Γ which is the maximum length of

a path from the root to a terminal node is denoted by

h(Γ). Let ∆(T) be the set of rows of T. The aver-

age depth of Γ is denoted by h

avg

(Γ) which is equal

∑

r∈∆(T)

(r)

N(T)

, where l

(r) is the length of the path

from the root of Γ to a terminal node v for which r

belongs to T(v). The number of nodes in the decision

tree Γ is denoted by L(Γ).

Algorithm 1: Greedy algorithm U.

Input: A decision table T with many-valued deci-

sions, and conditional attributes f

, . . . , f

Output: Decision tree U(T) for T.

Construct the tree G consisting of a single node la-

beled with the table T;

while (true) do

if No node of the tree G is labeled with a table

then

Denote the tree G by U(T);

else

Choose a node v in G which is labeled with a

subtable T

′

of the table T;

if M(T

′

) = 0 then

Instead of T

′

mark the node v with the most

common decision for T

′

;

else

For each f

∈ E(T

′

), we compute the value

of the impurity function I(T

′

, f

) equal to

∑

b∈E(T

′

, f

)

M(T

′

( f

, b)) × N(T

′

( f

, b));

Choose the attribute f

∈ E(T

′

), where i

is the minimum i for which I(T

′

, f

) has

the minimum value; Instead of T

′

mark the

node v with the attribute f

;

For each δ ∈ E(T

′

, f

), add to the tree G the

node v

and mark this node with the sub-

table T

′

( f

, δ);

Draw an edge from v to v

and mark this

edge with δ;

end if

end while

3 GREEDY ALGORITHM U

The greedy algorithm U, for a given decision table

T with many-valued decisions, constructs a decision

tree U(T) for T (see Algorithm 1). We will interpret

decision table with one-valued decisions, i.e. T

MCD

, as a decision table with many-valued decisions

where each row is labeled with a set of decisions that

has one element. Hence, we can apply the same algo-

rithm for all three cases.

In Algorithm 1, the heuristic M refers the misclas-

siﬁcation error for the given decision table T. The

'MisclassificationError'GreedyHeuristictoConstructDecisionTreesforInconsistentDecisionTables

187

Table 5: Characteristics of decision tables with many-valued decisions from KEEL data sets.

Decision Rows Attr Spectrum

table T #1 #2 #3 #4 #5 #6 #7 #8 #9 #10

bibtex

∗

7355 1836 2791 1825 1302 669 399 179 87 46 18 7

corel5k 4998 499 3 376 1559 3013 17 0 1 0 0 0

delicious

∗

15862 944 95 207 292 340 422 536 714 930 1108 1460

enron

∗

1561 1001 179 238 441 337 200 91 51 15 3 3

genbase 662 1186 560 58 31 8 2 3 0 0 0 0

medical 967 1449 741 212 14 0 0 0 0 0 0 0

number of rows N(T

′

( f

, b)) is used as the weight of

the corresponding table T

′

( f

, b). The impurity func-

tion I has been calculated as the weighted sum of the

values of the heuristic, i.e., the M value is multiplied

with the weight of the corresponding table. Then, the

multiplied values are summed up.

Now, the work of the greedy algorithm for con-

struction of a decision tree for the decision table T

MVD

is portrayed in Fig. 1. The table T

MVD

is not de-

generate, so for i ∈ {1, 2, 3}, the value of I( f

) has

been computed: I( f

) = 0+ 3 = 3, I( f

) = 0+ 3 = 3,

I( f

) = 3 + 2 = 5. The minimum values are I( f

and I( f

). Then, the attribute f

is selected as it is

with minimum index. After that, it is assigned to the

root of the constructed tree.

The left child of the root denotes the node which

points the subtable T

MVD

( f

, 1) and the edge is

marked by 1. The subtable T

MVD

( f

, 1) is degener-

ate and the common decision is 3. So, the node is

labeled by the decision 3. Similarly, the right child

of the root denote the subtable T

MVD

( f

, 0) which is

not degenerate. Then, it is further divided according

to the algorithm 1 by choosing the attribute f

. The

child nodes of the node labeled with f

are degenerate

and they are marked by their common decisions. As

all nodes are labeled, the work of the algorithm is ﬁn-

ished. At the end, the decision tree is U(T

MVD

) (see

Fig. 1) for the table T

MVD

4 EXPERIMENTAL RESULTS

A number of decision tables T from KEEL multi-

label data sets (Alcal-Fdez et al., 2009) as well as

from UCI ML repository (Bache and Lichman, 2013)

have been considered. Data sets from KEEL are al-

ready in decision table with many-valued decisions

format T

MVD

. These tables T are further converted

into formats T

MCD

(in this case, the ﬁrst decision is se-

lected from the set of decisions attached to a row) and

by the procedure described in Section 2. Con-

versely, more conditional attributes are removed from

the data sets of UCI ML Repository to convert data

sets into inconsistent decision tables. These inconsis-

tent tables were further converted into many-valued,

most common and generalized decision by the pro-

cedure described in Section 2. The information about

the considered decision tables is shown in Table 5 and

These tables contain name of decision table T =

MVD

, number of rows (column “Rows”), number of

attributes (column “Attr”), and spectrum of the table

T (column “Spectrum”). Spectrum of a decision table

with many-valued decisions is a sequence #1, #2,. ..,

where #i, i = 1, 2, . . ., is the number of rows labeled

with sets of decisions with the cardinality equal to i.

For some tables (marked with * in Table 5), the spec-

trum is too long to ﬁt in the page width. Hence, it has

been shown up to the element of the sequence that is

possible to show in the page width limit.

Table 6 and 9 contain depth, average depth

and number of nodes for decision trees U(T

MVD

U(T

MCD

) and U(T

) constructed by the greedy al-

gorithm U using greedy heuristic ‘misclassiﬁcation

error’ for decision tables T

MVD

, T

MCD

and T

respec-

tively. For the depth of the trees, even though some

cases (‘enron’, ‘genbase’) MVD approach gives larger

tree depth, but on average it gives smallest depth of

the tree. For other cost functions like average depth

and number of nodes, it gives equal or smaller tree

size than MCD and GD approach. Now, if one com-

pares between MCD approach and GD approach, one

can ﬁnd that GD approach gives either equal or larger

tree size.

Therefore, the decision trees constructed in the

frameworks of MVD approach are usually simpler

than the decision trees constructed in the frameworks

of MCD approach, and the decision trees constructed

in the frameworks of MCD approach are usually sim-

pler than the decision trees constructed in the frame-

work of GD approach.

Furthermore, the decision tree depth, average

depth and number of nodes has been compared based

on greedy heuristics. In the paper (Azad et al., 2013),

the number of boundary subtables has been consid-

ered as heuristic to compare among three approaches.

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

188

Table 6: Depth, average depth and number of nodes for decision trees U(T

MVD

), U(T

) and U(T

MCD

Decision Depth Average Depth Number of Nodes

table T MVD MCD GD MVD MCD GD MVD MCD GD

bibtex 39 42 43 11.52 12.24 12.97 9357 10583 13521

corel5k 156 156 157 36.1 36.41 36.29 6899 8235 9823

delicious 79 92 92 13.74 15.9 16.718 6455 18463 31531

enron 28 26 41 9.18 9.62 11.18 743 1071 2667

genbase 12 12 11 4.718 4.937 5.762 43 49 81

medical 16 16 16 8.424 8.424 8.424 747 747 747

average 55 57.33 60 13.95 14.59 15.22 4040.67 6524.67 9728.33

Table 7: Characteristics of decision tables with many-

valued decisions from UCI data sets.

Decision Rows Attr Spectrum

table T #1 #2#3#4#5#6

balance-scale-1 125 3 45 5030

breast-cancer-1 193 8 169 24

breast-cancer-5 98 4 58 40

cars-1 432 5 258 16113

ﬂags-5 171 21 159 12

hayes-roth-data-1 39 3 22 13 4

kr-vs-kp-5 1987 311564 423

kr-vs-kp-4 2061 321652 409

lymphography-5 122 13 113 9

mushroom-5 4078 174048 30

nursery-1 4320 728581460 2

nursery-4 240 4 97 9647

spect-test-1 164 21 161 3

teeth-1 22 7 12 10

teeth-5 14 3 6 3 0 5 0 2

tic-tac-toe-4 231 5 102 129

tic-tac-toe-3 449 6 300 149

zoo-data-5 42 11 36 6

But in this paper, the result using ‘misclassiﬁcation

error’ heuristic has been compared with the result us-

ing ‘number of boundary subtables’ heuristic. Table 8

shows the results using ‘number of boundary subta-

bles’ heuristic along with ‘weighted-sum’ impurity

function. If one looks at the Table 8 and 9, the per-

formance of the heuristic ‘misclassiﬁcation error’ is

comparable to the heuristic ‘number of boundary sub-

table’. In the case of ‘depth’, it is slightly smaller for

‘number of boundary subtables’, and in the case of

‘average depth’, it is slightly smaller for ‘misclassi-

ﬁcation error’ heuristic. But the ‘number of nodes’

of tree is essentially smaller for the ‘misclassiﬁcation

error’ heuristic.

The reason of using UCI ML repository data sets

to compare the two heuristics is that the ‘number of

boundary subtables’ heuristic requires a higher degree

polynomial (if one consider the maximum number of

decisions in the table is bounded) running time and

huge amount of memory. Therefore, smaller data sets

from UCI ML repository have been used to compare

between the two heuristics. The advantage is that the

‘misclassiﬁcation error’ heuristic is simple, and re-

quires little time and few memory as well as it gives

better results for the minimization of average depth

and number of nodes of the constructed tree.

5 CONCLUSIONS

In this paper, three approaches have been described

to represent the useful knowledge from inconsistent

decision tables in terms of decision tree model. The

complexity of decision trees (depth, average depth,

and number of nodes) constructed by the greedy al-

gorithm with ‘weighted sum’ impurity type and ‘mis-

classiﬁcation error’ heuristic have been compared for

these approaches.

Experimental results show that the approach based

on the many-valued decisions outperforms the ap-

proaches based on the generalized decisions and the

most common decisions, also it is found that the per-

formance of new heuristic is sometimes better than

the ‘number of boundary subtables’ heuristic.

In future to get a good decision tree model, we

want to investigate the effect of new impurity types

and new heuristics for inconsistent decision tables.

We also would like to investigate the effect of such

algorithms when we consider the prediction problem

for inconsistent decision tables.

ACKNOWLEDGEMENT

Research reported in this publication was supported

by the King Abdullah Universityof Science and Tech-

nology (KAUST).

'MisclassificationError'GreedyHeuristictoConstructDecisionTreesforInconsistentDecisionTables

189

Table 8: Depth, average depth and number of nodes for decision trees U(T

MVD

), U(T

) and U(T

MCD

) for UCI data sets

using number of boundary subtables heuristic.

Decision Depth Average Depth Number of Nodes

table T MVD MCD GD MVD MCD GD MVD MCD GD

balance-scale-1 2 3 3 2 2.52 3 31 96 156

breast-cancer-1 6 6 7 3.575 3.658 4.104 150 161 220

breast-cancer-5 3 4 4 1.837 2.184 2.602 49 77 102

cars-1 5 5 5 2.361 3.167 4.007 97 197 312

ﬂags-5 6 6 6 3.772 3.819 3.854 211 219 226

hayes-roth-data-1 2 3 3 1.744 1.974 2.308 17 26 39

kr-vs-kp-5 12 15 15 7.939 8.169 9.08 681 957 1635

kr-vs-kp-4 13 14 15 8.009 8.271 9.18 711 1011 1723

lymphography-5 6 6 7 3.787 4.033 4.189 80 98 110

mushroom-5 6 7 8 2.753 2.768 2.78 219 235 261

nursery-1 7 7 7 2.172 3.48 4.132 220 920 1477

nursery-4 2 4 4 1.333 2.283 2.417 9 53 61

spect-test-1 6 9 9 3.037 3.238 3.482 31 45 53

teeth-1 4 4 4 2.818 2.818 2.818 35 35 35

teeth-5 3 3 3 2.214 2.214 2.214 20 20 20

tic-tac-toe-4 5 5 5 2.957 4.017 4.506 73 174 243

tic-tac-toe-3 6 6 6 4.058 4.577 5.343 191 320 542

zoo-data-5 5 6 7 3.071 3.31 3.952 19 25 41

average 5.5 6.28 6.56 3.3 3.69 4.11 158 259.39 403.11

Table 9: Depth, average depth and number of nodes for decision trees U(T

MVD

), U(T

) and U(T

MCD

) for UCI data sets

using misclassiﬁcation error heuristic.

Decision Depth Average Depth Number of Nodes

table T MVD MCD GD MVD MCD GD MVD MCD GD

balance-scale-1 2 3 3 2 2.52 3 31 96 156

breast-cancer-1 6 6 7 3.679 3.736 4.104 152 160 217

breast-cancer-5 3 4 4 1.816 2.184 2.602 46 77 102

cars-1 5 5 5 1.958 2.583 3.813 43 101 280

ﬂags-5 6 6 6 3.754 3.801 3.836 210 216 223

hayes-roth-data-1 2 3 3 1.744 1.974 2.308 17 26 39

kr-vs-kp-5 13 14 15 7.802 8.329 9.503 543 873 1539

kr-vs-kp-4 14 14 14 7.87 8.504 9.477 555 915 1635

lymphography-5 7 7 7 3.787 4.115 4.311 77 94 112

mushroom-5 7 7 8 2.772 2.781 2.795 246 253 265

nursery-1 7 7 7 2.169 3.469 4.127 198 832 1433

nursery-4 2 4 4 1.333 2.283 2.083 9 53 54

spect-test-1 6 10 10 3.134 3.274 3.543 35 43 53

teeth-1 4 4 4 2.818 2.818 2.818 35 35 35

teeth-5 3 3 3 2.214 2.214 2.214 20 20 20

tic-tac-toe-4 5 5 5 2.965 4.139 4.286 76 182 216

tic-tac-toe-3 6 6 6 4.145 4.78 5.207 199 362 490

zoo-data-5 4 7 7 3.214 3.714 4.119 19 25 41

average 5.67 6.39 6.56 3.29 3.73 4.12 139.5 242.39 383.89

REFERENCES

Alcal-Fdez, J., Snchez, L., Garca, S., Jesus, M., Ventura, S.,

Garrell, J., Otero, J., Romero, C., Bacardit, J., Rivas,

V., Fernndez, J., and Herrera, F. (2009). KEEL Multi

Label Data Sets.

KDIR2014-InternationalConferenceonKnowledgeDiscoveryandInformationRetrieval

190

Azad, M., Chikalov, I., and Moshkov, M. (2013). Three

approaches to deal with inconsistent decision tables -

comparison of decision tree complexity. In RSFDGrC,

pages 46–54.

Bache, K. and Lichman, M. (2013). UCI machine learning

repository.

Blockeel, H., Schietgat, L., Struyf, J., Dzeroski, S., and

Clare, A. (2006). Decision trees for hierarchi-

cal multilabel classiﬁcation: A case study in func-

tional genomics. In F¨urnkranz, J., Scheffer, T., and

Spiliopoulou, M., editors, PKDD 2006, Berlin, Ger-

many, Proceedings, volume 4213 of LNCS, pages 18–

29. Springer.

Boutell, M. R., Luo, J., Shen, X., and Brown, C. M.

(2004). Learning multi-label scene classiﬁcation. Pat-

tern Recognition, 37(9):1757–1771.

Clare, A. and King, R. D. (2001). Knowledge discovery in

multi-label phenotype data. In PKDD, pages 42–53.

Comit´e, F. D., Gilleron, R., and Tommasi, M. (2003).

Learning multi-label alternating decision trees from

texts and data. In MLDM, pages 35–49.

Loza Menc´ıa, E. and F¨urnkranz, J. (2008). Pairwise learn-

ing of multilabel classiﬁcations with perceptrons. In

IJCNN, pages 2899–2906.

Mitchell, T. M. (1997). Machine Learning. McGraw-Hill,

Inc., NY, USA, 1 edition.

Moshkov, M. and Zielosko, B. (2011). Combinatorial Ma-

chine Learning – A Rough Set Approach, volume 360

of Studies in Computational Intelligence. Springer.

Pawlak, Z. (1991). Rough Sets – Theoretical Aspects of

Reasoning about Data. Kluwer Academic Publishers,

Dordrecht.

Skowron, A. and Rauszer, C. (1992). The discernibility ma-

trices and functions in information systems. In Intelli-

gent Decision Support. Handbook of Applications and

Advances of the Rough Set Theory, pages 331–362.

Kluwer Academic Publishers, Dordrecht.

Tsoumakas, G. and Katakis, I. (2007). Multi-label classiﬁ-

cation: An overview. IJDWM, 3(3):1–13.

Tsoumakas, G., Katakis, I., and Vlahavas, I. P. (2010). Min-

ing multi-label data. In Data Mining and Knowl-

edge Discovery Handbook, 2nd ed., pages 667–685.

Springer.

Wieczorkowska, A., Synak, P., Lewis, R. A., and Ras, Z. W.

(2005). Extracting emotions from music data. In IS-

MIS, pages 456–465.

Zhou, Z.-H., Jiang, K., and Li, M. (2005). Multi-instance

learning based web mining. Appl. Intell., 22(2):135–

147.

Zhou, Z.-H., Zhang, M.-L., Huang, S.-J., and Li, Y.-F.

(2012). Multi-instance multi-label learning. Artif. In-

tell., 176(1):2291–2320.

'MisclassificationError'GreedyHeuristictoConstructDecisionTreesforInconsistentDecisionTables

191