encoders used in this study. Further experiments are
required to reach a final conclusion on the best encod-
ing technique.
(RQ2): The results show that both single techniques
and the homogeneous ensemble developed in this
study demonstrate similar predictive accuracy levels.
In certain cases, the single KNN technique outper-
forms the ensemble technique, regardless of the en-
coder used for dataset processing.
Exploring alternative encoding techniques for pro-
cessing categorical data in SDEE datasets is an im-
portant research direction. Additionally, investigating
heterogeneous ensembles that incorporate different
ML techniques trained on various processed datasets
is crucial to determine whether encoder techniques
can serve as a source of diversity in ensemble ap-
proaches.
REFERENCES
Ali, A. and Gravino, C. (2019). A systematic literature
review of software effort prediction using machine
learning methods. Journal of software: evolution and
process, 31(10):e2211.
Amazal, F. A. and Idri, A. (2019). Handling of categori-
cal data in software development effort estimation: a
systematic mapping study. In 2019 Federated Confer-
ence on Computer Science and Information Systems
(FedCSIS), pages 763–770. IEEE.
Angelis, L., Stamelos, I., and Morisio, M. (2001). Building
a software cost estimation model based on categorical
data. In Proceedings Seventh International Software
Metrics Symposium, pages 4–15. IEEE.
Azhar, D., Riddle, P., Mendes, E., Mittas, N., and Ange-
lis, L. (2013). Using ensembles for web effort es-
timation. In 2013 ACM/IEEE International Sympo-
sium on Empirical Software Engineering and Mea-
surement, pages 173–182. IEEE.
Breskuvien
˙
e, D. and Dzemyda, G. (2023). Categorical
feature encoding techniques for improved classifier
performance when dealing with imbalanced data of
fraudulent transactions. INTERNATIONAL JOURNAL
OF COMPUTERS COMMUNICATIONS & CON-
TROL, 18(3).
Cabral, J. T. H. d. A., Oliveira, A. L., and da Silva, F. Q.
(2023). Ensemble effort estimation: An updated and
extended systematic literature review. Journal of Sys-
tems and Software, 195:111542.
De La Bourdonnaye, F. and Daniel, F. (2021). Evalu-
ating categorical encoding methods on a real credit
card fraud detection database. arXiv preprint
arXiv:2112.12024.
Foss, T., Stensrud, E., Kitchenham, B., and Myrtveit, I.
(2003). A simulation study of the model evaluation
criterion mmre. IEEE Transactions on software engi-
neering, 29(11):985–995.
Hosni, M., Idri, A., and Abran, A. (2018a). Improved ef-
fort estimation of heterogeneous ensembles using fil-
ter feature selection. In ICSOFT, pages 439–446.
Hosni, M., Idri, A., Abran, A., and Nassif, A. B. (2018b).
On the value of parameter tuning in heterogeneous en-
sembles effort estimation. Soft Computing, 22:5977–
6010.
Idri, A., Hosni, M., and Abran, A. (2016). Systematic liter-
ature review of ensemble effort estimation. Journal of
Systems and Software, 118:151–175.
Jorgensen, M. and Shepperd, M. (2006). A systematic re-
view of software development cost estimation stud-
ies. IEEE Transactions on software engineering,
33(1):33–53.
Kocaguneli, E., Kultur, Y., and Bener, A. (2009). Com-
bining multiple learners induced on multiple datasets
for software effort prediction. In International Sym-
posium on Software Reliability Engineering (ISSRE).
Kocaguneli, E. and Menzies, T. (2013). Software effort
models should be assessed via leave-one-out valida-
tion. Journal of Systems and Software, 86(7):1879–
1890.
Kocaguneli, E., Menzies, T., and Keung, J. W. (2011). On
the value of ensemble effort estimation. IEEE Trans-
actions on Software Engineering, 38(6):1403–1416.
Li, J., Ruhe, G., Al-Emran, A., and Richter, M. M. (2007).
A flexible method for software effort estimation by
analogy. Empirical Software Engineering, 12:65–106.
Miyazaki, Y., Takanou, A., Nozaki, H., Nakagawa, N., and
Okada, K. (1991). Method to estimate parameter val-
ues in software prediction models. Information and
Software Technology, 33(3):239–243.
Oliveira, A. L., Braga, P. L., Lima, R. M., and Corn
´
elio,
M. L. (2010). Ga-based method for feature selection
and parameters optimization for machine learning re-
gression applied to software effort estimation. infor-
mation and Software Technology, 52(11):1155–1166.
Wen, J., Li, S., Lin, Z., Hu, Y., and Huang, C. (2012). Sys-
tematic literature review of machine learning based
software development effort estimation models. In-
formation and Software Technology, 54(1):41–59.
Encoding Techniques for Handling Categorical Data in Machine Learning-Based Software Development Effort Estimation
467