3 GREEDY APPROACH
Let I be an impurity function. We now describe a
greedy algorithm V
I
which for a given decision table
T constructs a decision tree V
I
(T) for the table T.
Step 1. Construct a tree consisting of a single node
labeled with the table T and proceed to the second
step.
Suppose t ≥ 1 steps have been made already. The
tree obtained at the step t will be denoted by G.
Step (t + 1). If no node of the tree G is labeled
with a table then we denote by V
I
(T) the tree G. The
work of the algorithm V
I
is completed.
Otherwise, we choose a node v in the tree G
which is labeled with a subtable Θ of the table T. If
rt(Θ) = 0 then instead of Θ we mark the node v by
the common decision for Θ and proceed to the step
(t + 2). Let rt(Θ) > 0. Then for each f
i
∈ E(Θ) we
compute the value I(T, f
i
). We mark the node v by the
attribute f
i
0
where i
0
is the minimum i ∈ {1,...,m}
for which I(T, f
i
) has the minimum value. For each
δ ∈ E(Θ, f
i
0
), we add to the tree G the node v(δ),
mark this node by the subtable Θ( f
i
0
,δ), draw the
edge from v to v(δ), and mark this edge by δ. Pro-
ceed to the step (t + 2).
4 DYNAMIC PROGRAMMING
APPROACH
In this section, we describe a dynamic programming
algorithm which for a monotone cost function ψ and
decision table T finds the minimum cost (relative to
the cost function ψ) of decision tree for T.
Consider an algorithm for construction of a graph
∆(T). Nodes of ∆(T) are some separable subtables
of the table T. During each step we process one node
and mark it with symbol *. We start with the graph
that consists of one node T and finish when all nodes
of the graph are processed.
Let the algorithm have already performed p steps.
We now describe the step number (p + 1). If all
nodes are processed then the work of the algorithm
is finished, and the resulted graph is ∆(T). Other-
wise, choose a node (table) Θ that has not been pro-
cessed yet. If rt(Θ) = 0, label the considered node
with the common decision for Θ, mark it with sym-
bol * and proceed to the step number (p + 2). Let
rt(Θ) > 0. For each f
i
∈ E(Θ), draw a bundle of
edges from the node Θ (this bundle of edges will be
called f
i
-bundle). Let E(Θ, f
i
) = {a
1
,...,a
t
}. Then
draw t edges from Θ and label these edges with pairs
( f
i
,a
1
),...,( f
i
,a
t
) respectively. These edges enter
into nodes Θ( f
i
,a
1
),...,Θ( f
i
,a
t
). If some of nodes
Θ( f
i
,a
1
),...,Θ( f
i
,a
t
) do not present in the graph
then add these nodes to the graph. Mark the node Θ
with symbol * and proceed to the step number (p+2).
Let ψ be a monotone cost function given by the
pair ψ
0
, F. We now describe a procedure, which at-
taches a number to each node of ∆(T). We attach the
number ψ
0
to each terminal node of ∆(T).
Consider a node Θ, which is not terminal, and a
bundle of edges, which starts in this node. Let edges
be labeled with pairs ( f
i
,a
1
),...,( f
i
,a
t
), and edges
enter to nodes
Θ( f
i
,a
1
),...,Θ( f
i
,a
t
), to which numbers ψ
1
,...,ψ
t
are attached already. Then we attach to the consid-
ered bundle the number F(N(Θ),ψ
1
,...,ψ
t
). Among
numbers attached to bundles starting in Θ we choose
the minimum number and attach it to the node Θ.
We stop when a number will be attached to the
node T in the graph ∆(T). One can show that this
number is the minimum cost (relative to the cost func-
tion ψ) of decision tree for T.
5 EXPERIMENTAL RESULTS
Different impurity functions give us different greedy
algorithms. The following experiments compare av-
erage depth, number of nodes and depth of decision
trees built by these algorithms with the minimum av-
erage depth, minimum number of nodes and mini-
mum depth calculated by the dynamic programming
algorithm.
The data sets were taken from UCI Machine
Learning Repository (Frank and Asuncion, 2010).
Experiments using data sets which are not from UCI
Machine Learning Repository give us similar results.
Each data set is represented as a table containing
several input columns and an output (decision) col-
umn. Some data sets contain index columns that take
unique value for each row. Such columns were re-
moved. In some tables there were rows that contain
identical values in all columns, possibly, except the
decision column. In this case each group of identical
rows was replaced with a single row with common
values in all input columns and the most common
value in the decision column. In some tables there
were missed values. Each such value was replaced
with the most common value in the corresponding
column.
Tables 1–3 show results of experiments with 24
data sets and three cost functions: average depth,
number of nodes and depth respectively. Each row
contains data set name, minimum cost of decision
tree (min cost), calculated with the dynamic pro-
gramming algorithm (see column Opt), and infor-
KDIR 2011 - International Conference on Knowledge Discovery and Information Retrieval
440