predictive performance and increase robustness.
Such system may deliver superior performance as
was previously remarked in (Lessmann et al., 2013)
and (Lessmann et al., 2015).
REFERENCES
Arik, S., Pfister, T. (2021). TabNet: Attentive interpretable
tabular learning. In AAAI 2021, Proceedings of the
35th AAAI conference on artificial intelligence, pages
6679 – 6687.
Breiman, L. (2001). Random forests. Machine learning
45, pages 5 – 32.
Chen, L. (2021). Statistical Learning for Analysis of
Credit Risk Data. IOSR Journal of Mathematics 17,
pages 45 – 51.
Chen, T., Guestrin, C. (2016). XGBoost: A scalable tree
boosting system. In KDD 2016, Proceedings of the
22nd ACM International Conference on Knowledge
Discovery & Data Mining, pages 785 – 794.
Chen, C., Liaw, A., Breiman, L. (2004). Using random
forest to learn imbalanced data. University of
California Berkeley, report number 666, pages 1 – 12.
Dumitrescu, E., Hué, S., Hurlin, C., Tokpavi, S. (2022).
Machine learning for credit scoring: Improving
logistic regression with non-linear decision-tree
effects. European Journal of Operational Research
297, pages 1178 – 1192.
Gidlow, L. (2022). The Effect of Dataset Size on the
Performance of Classification Algorithms for Credit
Scoring. University of Cape Town, available at
http://hdl.handle.net/11427/37193.
Google Brain (2016). TensorFlow: A system for large-
scale machine learning. In OSDI 2016, Proceedings of
the 12th USENIX conference on Operating Systems
Design and Implementation, pages 265 – 283.
Gunnarsson, B. R., Vanden Broucke, S., Baesens, B.,
Óskarsdóttir, M., Lemahieu, W. (2021). Deep learning
for credit scoring: Do or don’t?. European Journal of
Operational Research 295, pages 292 – 305.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002).
Gene selection for cancer classification using support
vector machines. Machine learning, 46, 389 – 422.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J.,
Scholkopf, B. (1998). Support vector machines. IEEE
Intelligent Systems and their applications 13, pages 18
– 28.
Ivan, T. (1976) An Experiment with the Edited Nearest-
Neighbor Rule. IEEE Transactions on Systems, Man,
and Cybernetics 6, pages 448 – 452.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma,
W., Ye, Q., Liu, T. Y. (2017). LightGBM: A highly
efficient gradient boosting decision tree. Advances in
neural information processing systems 30, pages 1 – 9.
Lessmann, S., Baesens, B., Seow, H. V., Thomas, L. C.
(2015). Benchmarking state-of-the-art classification
algorithms for credit scoring: An update of
research. European Journal of Operational
Research 247, pages 124 – 136.
Lessmann, S., Seow, H., Baesens, B., Thomas, L. C.
(2013). Benchmarking state-of-the-art classification
algorithms for credit scoring: A ten-year
update. Credit Research Centre, Conference Archive.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J.,
Chanan, G., Killeen, T., Lin, Z., Gimelshein, N.,
Antiga, L., Desmaison, A., Kopf, A., Yang, E.,
DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S.,
Steiner, B., Fang, L., Chintala, S. (2019). PyTorch: An
Imperative Style, High-Performance Deep Learning
Library. Advances in Neural Information Processing
Systems 32, pages 8024 – 8035.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V.,
Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P.,
Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot, M., &
Duchesnay, É. (2011). Scikit-learn: Machine Learning
in Python. Journal of Machine Learning Research 12,
2825 – 2830.
Tang, B., He, H. (2015). ENN: Extended nearest neighbor
method for pattern recognition. IEEE Computational
Intelligence Magazine 10, pages 52 – 60.
Improving Machine Learning Performance in Credit Scoring by Data Analysis and Data Pre-Processing