significant differences between standard GP and GP
with lazy evaluation with 10 instances (p < 0.001).
4.2 Evolution Time Analysis
Figure 6: Average total running times of the whole evolu-
tion process in milliseconds for all settings.
LE = Lazy evaluation; inst = number of instances used in
the evaluation process.
Looking at the running time shows a clear lead
of the lazy evaluation GPs in comparison to standard
GP. The slowest lazy evaluation GP is the one with
10 instances used in the evaluation process that used
on average 4495.92 milliseconds, which is just 62.6%
of the average total running time of the standard GP
with the average total running time of 7182,97 mil-
liseconds.
We see that the running times are not proportion-
ally smaller in comparison to the theoretical saving
due to fewer evaluations, but they are still smaller.
5 CONCLUSIONS
We proposed a lazy evaluation approach in genetic
programming process of creating the classification
decision trees that uses dynamic choosing of the in-
stances.
Results of the first experiments show, that this ap-
proach has great potential and should be explored fur-
ther on. Not only that all of the lazy evaluation GPs
took less processing time to finish the whole evolution
process in comparison to standard GP, some settings
(with more instances in evaluation process) returned
comparable results (in accuracy and average F-score).
One of the lazy evaluation settings included (with 10
instances in evaluation) in the experiment even re-
turned better results than the standard GP. This can be
contributed to changing environment of the GP, thus
preventing to overfit the solutions and to the weight-
ing process that gives more importance (more chance
to be involved in evaluation process) to harder to clas-
sify instances.
We are planning to research lazy evaluation fur-
ther and to test the importance of the tournament size
and explore the number of evaluation instances fur-
ther on. Other than that, there is already an ongoing
implementation of parallel lazy evaluation that should
be directly comparable to parallel GP for decision tree
creation.
REFERENCES
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A.
(1984). Classification and regression trees. CRC
press.
Cheng, J., Fayyad, U. M., Irani, K. B., and Qian, Z. (1988).
Improved decision trees: a generalized version of id3.
In Proc. Fifth Int. Conf. Machine Learning, pages
100–107.
Espejo, P. G., Ventura, S., and Herrera, F. (2010). A
survey on the application of genetic programming to
classification. IEEE Transactions on Systems, Man
and Cybernetics, Part C: Applications and Reviews,
40(2):121–144.
Ganjisaffar, Y., Caruana, R., and Lopes, C. V. (2011). Bag-
ging gradient-boosted trees for high precision, low
variance ranking models. In Proceedings of the 34th
international ACM SIGIR conference on Research and
development in Information Retrieval, pages 85–94.
ACM.
Gathercole, C. and Ross, P. (1994). Dynamic training subset
selection for supervised learning in genetic program-
ming. Parallel Problem Solving from NaturePPSN III,
pages 312–321.
Liaw, A., Wiener, M., et al. (2002). Classification and re-
gression by randomforest. R news, 2(3):18–22.
Lichman, M. (2013). UCI machine learning repository.
Podgorelec, V.,
ˇ
Sprogar, M., and Pohorec, S. (2013). Evo-
lutionary design of decision trees. Wiley Interdisci-
plinary Reviews: Data Mining and Knowledge Dis-
covery, 3(2):63–82.
Podgorelec, V. and Zorman, M. (2015). Decision tree learn-
ing. In Encyclopedia of Complexity and Systems Sci-
ence, pages 1–28. Springer.
Quinlan, J. R. (2014). C4. 5: programs for machine learn-
ing. Elsevier.
ˇ
Sprogar, M. (2005). Excluding fitness helps improve ro-
bustness of evolutionary algorithms. In Knowledge-
Based Intelligent Information and Engineering Sys-
tems, pages 905–905. Springer.
Zhang, B.-T. and Cho, D.-Y. (1998). Genetic programming
with active data selection. In Asia-Pacific Conference
on Simulated Evolution and Learning, pages 146–153.
Springer.