The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers

Amaal R. Al Shorman, Hossam Faris, Pedro A. Castillo, J. J. Merelo, Nailah Al-Madi

Abstract

Genetic programming (GP) is a powerful classification technique. It is interpretable and it can dynamically build very complex expressions that maximize or minimize some fitness functions. It has a capacity to model very complex problems in the area of Machine Learning, Data Mining and Pattern Recognition. Nevertheless, GP has a high computational complexity time. On the other side, data standardization is one of the most important pre-processing steps in machine learning. The purpose of this step is to unify the scale of all input features to have equal contribution to the model. The objective of this paper is to investigate the influence of input data standardization methods on GP, and how it affects its prediction accuracy. Six different methods of input data standardization were checked in order to determine which one allows to achieve the most accurate result with lowest computational cost. The simulations have been implemented on ten benchmarked datasets with three different scenarios (varying the population size and number of generations). The results showed that the computational efficiency of GP is highly enhanced when coupled with some standardization methods, specifically Min-Max method for scenario I and Vector method for scenario II, and scenario III. Whereas, Manhattan and Z-Score methods had the worst results for all three scenarios.

Download


Paper Citation


in Harvard Style

Shorman A., Faris H., Castillo P., Merelo J. and Al-Madi N. (2018). The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers.In Proceedings of the 10th International Joint Conference on Computational Intelligence - Volume 1: IJCCI, ISBN 978-989-758-327-8, pages 79-85. DOI: 10.5220/0006959000790085


in Bibtex Style

@conference{ijcci18,
author={Amaal R. Al Shorman and Hossam Faris and Pedro A. Castillo and J. J. Merelo and Nailah Al-Madi},
title={The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers},
booktitle={Proceedings of the 10th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,},
year={2018},
pages={79-85},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0006959000790085},
isbn={978-989-758-327-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 10th International Joint Conference on Computational Intelligence - Volume 1: IJCCI,
TI - The Influence of Input Data Standardization Methods on the Prediction Accuracy of Genetic Programming Generated Classifiers
SN - 978-989-758-327-8
AU - Shorman A.
AU - Faris H.
AU - Castillo P.
AU - Merelo J.
AU - Al-Madi N.
PY - 2018
SP - 79
EP - 85
DO - 10.5220/0006959000790085