in ensembles. In Proceedings of the 23rd International
Conference on Enterprise Information Systems, pages
652–659. INSTICC, SciTePress.
Hagan, M. T., Demuth, H. B., and Beale, M. (1997). Neural
network design. PWS Publishing Co.
Hart, P. E., Stork, D. G., and Duda, R. O. (2000). Pattern
classification. Wiley Hoboken.
Hawkins, D. M. (2004). The problem of overfitting. Jour-
nal of chemical information and computer sciences,
44(1):1–12.
Haykin, S. and Network, N. (2004). A comprehensive foun-
dation. Neural networks, 2(2004):41.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and
Scholkopf, B. (1998). Support vector machines. IEEE
Intelligent Systems and their applications, 13(4):18–
28.
Hoo, Z. H., Candlish, J., and Teare, D. (2017). What is an
roc curve?
Hsu, H. and Lachenbruch, P. A. (2014). Paired t test. Wiley
StatsRef: statistics reference online.
Ishibuchi, H., Nakashima, T., and Nii, M. (2004). Classifi-
cation and modeling with linguistic information gran-
ules: Advanced approaches to linguistic Data Mining.
Springer Science & Business Media.
Japkowicz, N. (2000). The class imbalance problem: Sig-
nificance and strategies. In Proc. of the Int’l Conf. on
Artificial Intelligence, volume 56. Citeseer.
Japkowicz, N. and Stephen, S. (2002). The class imbalance
problem: A systematic study. Intelligent data analy-
sis, 6(5):429–449.
Kohavi, R. (1995). A study of cross-validation and boot-
strap for accuracy estimation and model selection. In
Proceedings of the 14th International Joint Confer-
ence on Artificial Intelligence - Volume 2, IJCAI’95,
page 1137–1143, San Francisco, CA, USA. Morgan
Kaufmann PUBLISHERs Inc.
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis,
M. V., and Fotiadis, D. I. (2015). Machine learning ap-
plications in cancer prognosis and prediction. Compu-
tational and structural biotechnology journal, 13:8–
17.
Kuncheva, L. I. and Whitaker, C. J. (2003). Measures
of diversity in classifier ensembles and their relation-
ship with the ensemble accuracy. Machine learning,
51(2):181–207.
Lema
ˆ
ıtre, G., Nogueira, F., and Aridas, C. K. (2017).
Imbalanced-learn: A python toolbox to tackle the
curse of imbalanced datasets in machine learning. The
Journal of Machine Learning Research, 18(1):559–
563.
Liashchynskyi, P. and Liashchynskyi, P. (2019). Grid
search, random search, genetic algorithm: a big com-
parison for nas. arXiv preprint arXiv:1912.06059.
Merz, C. J. (1999). Using correspondence analysis to com-
bine classifiers. Machine Learning, 36(1-2):33–58.
Mosavi, A., Ozturk, P., and Chau, K.-w. (2018). Flood pre-
diction using machine learning models: Literature re-
view. Water, 10(11):1536.
Muliono, R., Lubis, J. H., and Khairina, N. (2020). Anal-
ysis k-nearest neighbor algorithm for improving pre-
diction student graduation time. Sinkron: jurnal dan
penelitian teknik informatika, 4(2):42–46.
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., and
Brown, S. D. (2004). An introduction to decision tree
modeling. Journal of Chemometrics: A Journal of the
Chemometrics Society, 18(6):275–285.
Opitz, D. and Maclin, R. (1999a). Popular ensemble meth-
ods: An empirical study. Journal of artificial intelli-
gence research, 11:169–198.
Opitz, D. and Maclin, R. (1999b). Popular ensemble meth-
ods: An empirical study. Journal of Artificial Intelli-
gence Research, 11:169–198.
Oreski, G. and Oreski, S. (2014). An experimental
comparison of classification algorithm performances
for highly imbalanced datasets. In Central Euro-
pean Conference on Information and Intelligent Sys-
tems, page 4. Faculty of Organization and Informatics
Varazdin.
Park, B. and Bae, J. K. (2015). Using machine learning
algorithms for housing price prediction: The case of
fairfax county, virginia housing data. Expert systems
with applications, 42(6):2928–2934.
Polat, K., Yosunkaya, S¸., and G
¨
unes¸, S. (2008). Compari-
son of different classifier algorithms on the automated
detection of obstructive sleep apnea syndrome. Jour-
nal of Medical Systems, 32(3):243–250.
Quinlan, J. R. (1986). Induction of decision trees. Machine
learning, 1(1):81–106.
Quinlan, J. R. et al. (1996). Bagging, boosting, and c4. 5.
In Aaai/iaai, Vol. 1, pages 725–730.
Saqlain, M., Jargalsaikhan, B., and Lee, J. Y. (2019). A
voting ensemble classifier for wafer map defect pat-
terns identification in semiconductor manufacturing.
IEEE Transactions on Semiconductor Manufacturing,
32(2):171–182.
Schapire, R. E. (2013). Explaining adaboost. In Empirical
inference, pages 37–52. Springer.
Steinwart, I. and Christmann, A. (2008). Support vector
machines. Springer Science & Business Media.
Thabtah, F., Hammoud, S., Kamalov, F., and Gonsalves, A.
(2020). Data imbalance in classification: Experimen-
tal evaluation. Information Sciences, 513:429–441.
Visa, S., Ramsay, B., Ralescu, A. L., and Van Der Knaap,
E. (2011). Confusion matrix-based feature selection.
MAICS, 710:120–127.
Yap, B. W., Abd Rani, K., Abd Rahman, H. A., Fong, S.,
Khairudin, Z., and Abdullah, N. N. (2014). An appli-
cation of oversampling, undersampling, bagging and
boosting in handling imbalanced datasets. In Pro-
ceedings of the first international conference on ad-
vanced data and information engineering (DaEng-
2013), pages 13–22. Springer.
Zhang, H. (2005). Exploring conditions for the optimal-
ity of naive bayes. International Journal of Pattern
Recognition and Artificial Intelligence, 19(02):183–
198.
A Performance Analysis of Classifiers on Imbalanced Data
609