algorithm over training set, we share average in-
sample accuracy improvement as the difference
between the accuracy of the best accurate tree out of
100 run and the initial best accurate tree in the initial
generation. Table 4 presents these in-depth direct
comparisons at depth 2 and Table 5 at depth 3. Result
tables for depth 4 and 5 are provided in the Appendix.
`Time` is the total time in seconds for population
initialization plus GA execution time.
GA is codded in Java language (Java version
1.8.0) and computed on Eclipse IDE version 4.14.0.
These computations are done in a PC with Intel Core
i7-8550U 1.8GHz, 8GB RAM.
In Table 4, at depth 2, 5 out of 6 datasets show GA
with CART based initial population (MIX_V2) is
stronger than CART with an improvement over
CART of about 1% to 10% nominal improvement
depending on the dataset. Also, in some datasets, the
performance of the GA with random initial
population ( MIX_V1) is also stronger than the pure
CART performance but in general GA with MIX_V2
population has the best performance over CART and
GA with MIX_V1.
Improvement amounts of GA with MIX_V2
increases at higher depths. Moreover, at higher
depths, GA with MIX_V2 population is always the
best one. Especially at depth 5 (see Appendix 2), for
all the 6 datasets, GA with MIX_V2 is always the best
method.
For more detail, let's analyze the results of Avila
dataset at depth 2-3-4 and 5. When depths are getting
deeper, out-sample accuracies are also increased.
Although, in-sample accuracy improvements are
almost same for all depths. But, more crucial point is
difference between the MIX_V1 and MIX_V2. At all
depths, MIX_V2 is always outperformed CART and
MIX_V1. Only Wine dataset at depth 2 cannot
outperform CART at MIX_V1 and MIX_V2. But in
higher depths, we can beat CART performance as
well.
These results show that our GA can outperform
CART in all datasets when depth 3. On the other
hand, in mean in-sample perspective, GA with
MIX_V1 shows higher average in-sample accuracy
improvement. This metric shows effect of GA over
the initial population with the training set. In
MIX_V1 initial population we use only random trees
and their initial accuracies are very low over the
training set. After the GA implementation, we can
increase the accuracy of these random trees
dramatically. This shows, our GA works well to
increase the initial fitness of the provided problem
within a reasonable execution time for small and
medium size datasets.
We conclude that GA found trees with higher
prediction accuracy compared to greedy CART
algorithm in a reasonable time for small and medium
size datasets. For large size datasets (n>20,000), the
execution time increases, especially when we
increase the population size along with the data size.
But in all dataset sizes, we can observe at least 1%
accuracy improvement, which is crucial in
classification problems. Thanks to GA to improve the
performance of the given initial population and find
trees with better accuracies.
4 CONCLUSIONS
In this article, we describe and evaluate an
evolutionary algorithm, GA, for decision tree
induction. In conclusion, because of the
disadvantages of greedy approaches, some heuristics
will be combined to improve their performances.
Genetic algorithm is chosen in this work to combine
with CART. So, we use random initial population as
well to compare the performance of including CART
subtrees into the initial population. In GA, crossover
is applied to all parents, and mutation is applied only
for the given proportion of the children coming from
the crossover. In mutation, the randomly chosen node
is mutated.
Results show GA improves the performance of
given trees in the initial population. But if the initial
population contains random trees only, GA cannot
outperform CART solution usually. So, when we
include CART solutions to the population we can
improve their performances and outperform CART.
Results show MIX_V2 initial population is better
than de MIX_V1 population 5 out of 6 datasets at
depth 2, 3 and MIX_V2 is always the better one at
depth 4 and 5.
For future work, some additional steps will be
applied to this presented heuristic to observe more
improvements and some other heuristic methods will
be experienced to construct more accurate trees. For
example, in presented GA, some additional
operations and improvement moves will be tested and
selected according to their contribution. Also, we will
consider pruning steps in GA implementation and we
will be limiting some parameters like minbucket and
complexity parameter, which are used in constructing
CART. Additionally, we will generate different initial
population mixtures with the help of different
decision tree induction strategies and compare their
performance with the proposed ones.