6 CONCLUSIONS
The goal of this paper is to investigate the influence of
data standardization on the accuracy of GP classifica-
tion. To achieve this goal, three scenarios have been
implemented and tested using six different standardi-
zation methods based on ten datasets. The three sce-
narios differ in the number of population and number
of maximum generation, where scenario I has small
settings and scenario III has the largest settings.
The results of the simulations showed that by
using data standardization with GP can achieve hig-
her accuracy rates than GP without data standardiza-
tion. More specifically, by using standardization met-
hods, GP managed to achieved higher results with fe-
wer iterations and smaller population size. The best
results are obtained when using Min-Max and Vector
methods. Whereas, Manhattan and Z-Score methods
achieved worst accuracy results. Based on the three
scenarios, it can be inferred that data standardization
improve the classification accuracy of the generated
GP trees.
Our future work includes testing the effect of ot-
her GP parameters in combination with data standar-
dization, and testing the usage of GP for specific real
problems with data standardization and without.
ACKNOWLEDGEMENTS
This work has been supported in part by: Ministerio
espa
˜
nol de Econom
´
ıa y Competitividad under pro-
ject TIN2014-56494-C4-3-P (UGR-EPHEMECH),
TIN2017-85727-C4-2-P (UGR-DeepBio) and
SPIP2017-02116.
REFERENCES
Altman, E. I. (1968). Financial ratios, discriminant analy-
sis and the prediction of corporate bankruptcy. The
journal of finance, 23(4):589–609.
Anysz, H., Zbiciak, A., and Ibadov, N. (2016). The in-
fluence of input data standardization method on pre-
diction accuracy of artificial neural networks. Proce-
dia Engineering, 153:66–70.
Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U.
(1999). When is
¨
nearest neighbor
¨
meaningful? In
International conference on database theory, pages
217–235. Springer.
Cao, Y., Williams, D. D., and Williams, N. E. (1999). Data
transformation and standardization in the multivariate
analysis of river water quality. Ecological Applicati-
ons, 9(2):669–677.
Dheeru, D. and Karra Taniskidou, E. (2017). UCI machine
learning repository.
Griffith, L. E., Van Den Heuvel, E., Raina, P., Fortier, I.,
Sohel, N., Hofer, S. M., Payette, H., Wolfson, C.,
Belleville, S., Kenny, M., et al. (2016). Comparison
of standardization methods for the harmonization of
phenotype data: an application to cognitive measures.
American journal of epidemiology, pages 1–9.
Jabeen, H. and Baig, A. R. (2010). Review of classification
using genetic programming. International journal of
engineering science and technology, 2(2):94–103.
Kaftanowicz, M. and Krzemi
´
nski, M. (2015). Multiple-
criteria analysis of plasterboard systems. Procedia
Engineering, 111:364–370.
Kanevski, M., Pozdnukhov, A., and Timonin, V. (2008).
Machine learning algorithms for geospatial data. ap-
plications and software tools.
Koza, J. R. (1991). Evolving a computer program to gene-
rate random numbers using the genetic programming
paradigm. In ICGA, pages 37–44. Citeseer.
Koza, J. R. (1992). Genetic Programming: On the Pro-
gramming of Computers by Means of Natural Se-
lection. MIT Press, Cambridge, MA.
Sheta, A. F., Faris, H., and
¨
Oznergiz, E. (2014). Improving
production quality of a hot-rolling industrial process
via genetic programming model. International Jour-
nal of Computer Applications in Technology, 49(3-
4):239–250.
Wagner, S., Kronberger, G., Beham, A., Kommenda, M.,
Scheibenpflug, A., Pitzer, E., Vonolfen, S., Kofler, M.,
Winkler, S., Dorfer, V., and Affenzeller, M. (2014).
Advanced Methods and Applications in Computatio-
nal Intelligence, volume 6 of Topics in Intelligent En-
gineering and Informatics, chapter Architecture and
Design of the HeuristicLab Optimization Environ-
ment, pages 197–261. Springer.
Wang, H. and Zhang, J. (2009). Analysis of different data
standardization forms for fuzzy clustering evaluation
results’ influence. In Bioinformatics and Biomedi-
cal Engineering, 2009. ICBBE 2009. 3rd Internatio-
nal Conference on, pages 1–4. IEEE.
Zavadskas, E. K. and Turskis, Z. (2008). A new logarithmic
normalization method in games theory. Informatica,
19(2):303–314.
The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers
85