3 - 3
3 - 4 4 - 4 4 - 4
5 - 5 5 - 5 6 - 6
6 - 6 7 - 7 8 - 8
3 - 3
4 - 5 6 - 6 4 - 5
5 - 5 6 - 7 6 - 7
7 - 7 7 - 8 8 - 8
(a) (b)
Figure 6: Calculated lower bound and the exact VC-
dimension of univariate decision trees for datasets with 3
(a) and 4 (b) input features. Only the internal nodes are
shown.
4.2 Complexity Control using
VC-Dimension Bounds
In this section to show that our VC-dimension bounds
are useful, we use them for complexity control in de-
cision trees. Controlling complexity in decision trees
could be done in two ways. We can control the com-
plexities of the decision nodes by selecting the appro-
priate model for a node (Yıldız and Alpaydın, 2001),
or we can control the overall complexity of the de-
cision tree via pruning. Since this paper covers only
discrete univariate trees, we take the second approach
and use the VC-dimension bounds found in the previ-
ous section for pruning.
When we prune a node using SRM (SRMprune),
we first find the VC generalization error using Equa-
tion 2 where V is the VC-dimension and E
t
is the
training error of the subtree. Then, we find the train-
ing error of the node as if it is a leaf node. Since
the VC-dimension of a leaf node is 1, we can find
the generalization error of the tree as if it is pruned.
If the generalization error of the leaf node is smaller
than the generalization error of the subtree, we prune
the subtree, otherwise we keep it. We compare SRM
based pruning with CVprune, where we evaluate the
performance of the subtree with a leaf replacing the
subtree on a separate validation set. For the sake of
generality, we also include the results of trees be-
fore any pruning is applied (NOprune). We use a
total of 11 data sets where 9 of them are (artificial,
krvskp, monks, mushroom, promoters, spect, tictac-
toe, titanic, and vote) from UCI repository (Blake
and Merz, 2000) and 2 are (acceptors and donors)
Table 1: The average and standard deviations of error rates
of decision trees generated using NOprune, CVprune, and
SRMprune.
Set NOprune CVprune SRMprune
Acc 17.1 ± 1.6 15.5 ± 2.3 15.6 ± 1.6
Art 0.0 ± 0.0 0.5 ± 1.4 0.0 ± 0.0
Don 8.0 ± 1.1 7.1 ± 1.1 6.7 ± 1.1
Krv 0.3 ± 0.3 1.2 ± 0.7 0.6 ± 0.4
Mon 4.2 ± 5.9 10.0 ± 7.6 4.2 ± 5.9
Mus 0.0 ± 0.0 0.0 ± 0.1 0.0 ± 0.0
Pro 23.6 ± 12.5 24.7 ± 12.9 20.6 ± 12.3
Spe 25.4 ± 7.9 20.9 ± 3.6 22.1 ± 7.1
Tic 14.2 ± 3.8 18.5 ± 4.2 14.2 ± 3.8
Tit 21.0 ± 1.7 21.5 ± 2.1 22.6 ± 2.1
Vot 6.3 ± 3.6 4.4 ± 2.9 3.9 ± 3.4
Table 2: The average and standard deviations of tree
complexities of decision trees generated using NOprune,
CVprune, and SRMprune.
Set NOprune CVprune SRMprune
Acc 1015 ± 29 55 ± 42 838 ± 31
Art 16 ± 0 15 ± 2 16 ± 0
Don 1489 ± 32 145 ± 35 910 ± 74
Krv 138 ± 6 80 ± 13 122 ± 9
Mon 121 ± 50 57 ± 17 121 ± 50
Mus 43 ± 0 41 ± 4 43 ± 0
Pro 48 ± 5 13 ± 6 39 ± 3
Spe 165 ± 9 5 ± 10 60 ± 16
Tic 437 ± 31 123 ± 25 436 ± 31
Tit 32 ± 1 16 ± 4 5 ± 2
Vot 89 ± 8 9 ± 8 23 ± 8
bioinformatics datasets. We use 10×10 fold cross-
validation to generate training and test sets. For
CVprune, 20 percent of the training data is put aside
as the pruning set. for SRMprune we did a grid-search
on a
1
and a
2
using cross-validation and used a
1
= 0.1
and a
2
= 2.0.
Tables 1 and 2 show the average and standard de-
viations of error rates and tree complexities of deci-
sion trees generated using NOprune, CVprune, and
SRMprune respectively. On four datasets (artificial,
monks, mushroom, and tictactoe) there is no need to
prune, i.e., pruning decreases performance and in this
cases, CVprune prunes trees aggressively by sacrific-
ing from accuracy,whereas SRMprune does not prune
and gets the best performance with NOprune.
On five datasets (acceptors, donors, spect, pro-
moters, and vote) pruning helps, i.e., pruning both re-
duces both the error rate and the tree complexity as
needed. For those datasets, on two datasets CVprune
is better than SRMprune, whereas on three datasets
SRMprune is better than CVprune.
On two datasets (titanic and krvskp), both
ON THE VC-DIMENSION OF UNIVARIATE DECISION TREES
209