Automated Feature Engineering for AutoML Using Genetic Algorithms

Kevin Shi, Sherif Saad

2023

Abstract

Automated machine learning (AutoML) is an approach to automate the creation of machine learning pipelines and models. The ability to automatically create a machine learning pipeline would allow users without machine learning knowledge to create and use machine learning systems. However, many AutoML tools have no or limited automated feature engineering support. We develop an approach that is able to augment existing AutoMl tools with automated feature generation and selection. This generation method uses feature generators guided by and genetic algorithm to generate and select features as part of the AutoMl model selection process. We show that this approach is able to improve the AutoML model performance in 77% of all tested cases with up to 78% error reduction. Our approach explores how existing AutoML tools can be augmented with more automated steps to improve the generated machine learning pipeline’s performance.

Download


Paper Citation


in Harvard Style

Shi K. and Saad S. (2023). Automated Feature Engineering for AutoML Using Genetic Algorithms. In Proceedings of the 20th International Conference on Security and Cryptography - Volume 1: SECRYPT; ISBN 978-989-758-666-8, SciTePress, pages 450-459. DOI: 10.5220/0012090400003555


in Bibtex Style

@conference{secrypt23,
author={Kevin Shi and Sherif Saad},
title={Automated Feature Engineering for AutoML Using Genetic Algorithms},
booktitle={Proceedings of the 20th International Conference on Security and Cryptography - Volume 1: SECRYPT},
year={2023},
pages={450-459},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012090400003555},
isbn={978-989-758-666-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 20th International Conference on Security and Cryptography - Volume 1: SECRYPT
TI - Automated Feature Engineering for AutoML Using Genetic Algorithms
SN - 978-989-758-666-8
AU - Shi K.
AU - Saad S.
PY - 2023
SP - 450
EP - 459
DO - 10.5220/0012090400003555
PB - SciTePress