Author:
Mohamed Hosni
Affiliation:
MOSI Research Team, ENSAM, University Moulay Ismail of Meknes, Meknes, Morocco
Keyword(s):
Categorical Data, Encoder, Software Effort Estimation, Ensemble Effort Estimation.
Abstract:
Planning, controlling, and monitoring a software project primarily rely on the estimates of the software development effort. These estimates are usually conducted during the early stages of the software life cycle. At this phase, the available information about the software product is categorical in nature, and only a few numerical data points are available. Therefore, building an accurate effort estimator begins with determining how to process the categorical data that characterizes the software project. This paper aims to shed light on the ways in which categorical data can be treated in software development effort estimation (SDEE) datasets through encoding techniques. Four encoders were used in this study, including one-hot encoder, label encoder, count encoder, and target encoder. Four well-known machine learning (ML) estimators and a homogeneous ensemble were utilized. The empirical analysis was conducted using four datasets. The datasets generated by means of the one-hot encod
er appeared to be suitable for the ML estimators, as they resulted in more accurate estimation. The ensemble, which combined four variants of the same technique trained using different datasets generated by means of encoder techniques, demonstrated an equal or better performance compared to the single ML estimation technique. The overall results are promising and pave the way for a new approach to handling categorical data in SDEE datasets.
(More)