creased in all tests. With the best results with the
same value on the test without resampling, ROS, and
SMOTE. The best results are obtained with the accu-
racy 99.12%, recall 99.12%, precisions 99.13%, f1-
score 99.13% and AUC 0.988.
4 CONCLUSIONS
This study conducted a test for breast cancer predic-
tion. The use of resampling techniques such as ROS,
SMOTE, RUS, and SMOTE-Tomek is done to over-
come unbalanced data. The use of hyperparameter
tuning using a grid search with light gradient boost-
ing results in an increase and optimization of results.
Obtain the best results with accuracy 99.12%, re-
call 99.12%, precisions 99.13%, f1-score 99.13% and
AUC 0.988. In further research, testing can be done
with other breast cancer datasets or with other meth-
ods.
REFERENCES
Badr, E., Abdulsalam, M., and Ahmed, H. (2020). The
impact of scaling on support vector machine in
breast cancer diagnosis. Int. J. Comput. Appl,
175(19):15–19,.
Badr, E., Salam, M., and Ahmed, H. (2019). Optimizing
support vector machine using gray wolf optimizer al-
gorithm for breast cancer detection.
Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R., Torre, L.,
and Jemal, A. (2018). Global cancer statistics 2018:
Globocan estimates of incidence and mortality world-
wide for 36 cancers in 185 countries. CA. Cancer J.
Clin, 68(6):394–424,.
Dorn, M. (2021). Comparison of machine learning tech-
niques to handle imbalanced covid-19 cbc datasets.
PeerJ Comput. Sci, 7:670,.
Ernawan, F., Handayani, K., Fakhreldin, M., and Abbker, Y.
(2022). Light gradient boosting with hyper parameter
tuning optimization for covid-19 prediction. Int. J.
Adv. Comput. Sci. Appl, 13(8):514–523,.
Idris, N. and Ismail, M. (2021). Breast cancer disease clas-
sification using fuzzy-id3 algorithm with fuzzydbd
method: Automatic fuzzy database definition. PeerJ
Comput. Sci, 7:1–22,.
Joy, T., Rana, S., Gupta, S., and Venkatesh, S. (2016).
Hyperparameter tuning for big data using bayesian
optimisation. Proc. - Int. Conf. Pattern Recognit,
0:2574–2579,.
Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., and
Rehman, M. (2019). A model combining convo-
lutional neural network and lightgbm algorithm for
ultra-short-term wind power forecasting. IEEE Ac-
cess, 7(c):28309–28318,.
Khan, R., Suleman, T., Farooq, M., Rafiq, M., and Tariq,
M. (2017). Data mining algorithms for classifica-
tion of diagnostic cancer using genetic optimization
algorithms. IJCSNS Int. J. Comput. Sci. Netw. Secur,
12(March):207–211,.
Khuriwal, N. and Mishra, N. (2018). Breast cancer diag-
nosis using deep learning algorithm. In Proc. - IEEE
2018 Int. Conf. Adv. Comput. Commun. Control Net-
working, ICACCCN 2018, pages 98–103,.
Kumari, M. (2018). The impact of news information on the
stock recommendation system: A survey.
Lee, H. and Han, W. (2014). Unique features of young age
breast cancer and its management. J. Breast Cancer,
17(4):301–307,.
Majeed, W. (2014). Breast cancer: Major risk factors and
recent developments in treatment. Asian Pacific J.
Cancer Prev, 15(8):3353–3358,.
Mohammed, R., Rawashdeh, J., and Abdullah, M. (2020).
Machine learning with oversampling and undersam-
pling techniques: Overview study and experimental
results. 11th Int. Conf. Inf. Commun. Syst. ICICS
2020, (May):243–248,.
Omondiagbe, D., Veeramani, S., and Sidhu, A. (2019). Ma-
chine learning classification techniques for breast can-
cer diagnosis. IOP Conf. Ser. Mater. Sci. Eng, 495(1).
Pedregosa, F. (2011). Scikit-learn: Machine learning in
python. J. Mach. Learn. Res, 12:2825–2830,.
Pyingkodi, M., M., M., Shanthi, S., Saravanan, T., Then-
mozhi, K., Nanthini, K., Hemalatha, D., and Dhivya
(2020). Performance study of classification algo-
rithms using the breast cancer dataset. Int. J. Futur.
Gener. Commun. Netw.
Rend
´
on, E., Alejo, R., Castorena, C., Isidro-Ortega, F., and
Granda-Guti
´
errez, E. (2020). Data sampling methods
to dealwith the big data multi-class imbalance prob-
lem. Appl. Sci, 10(4).
Su, Y. (2020). Prediction of air quality based on gradi-
ent boosting machine method. In Proc. - 2020 Int.
Conf. Big Data Informatiz. Educ. ICBDIE 2020, pages
395–397,.
Sun, Y. (2017). Risk factors and preventions of breast can-
cer. Int. J. Biol. Sci, 13(11):1387–1397,.
Yu, T. and Zhu, H. (2020). Hyper-parameter optimization:
A review of algorithms and applications.
Zaitseva, E., Levashenko, V., Rabcan, J., and Krsak, E.
(2020). Application of the structure function in the
evaluation of the human factor in healthcare. Symme-
try, 12(1):93,.
ICAISD 2023 - International Conference on Advanced Information Scientific Development
180