10-Year Breast Cancer Survival Prediction Research based on Missing Value Imputation

Yufang Deng

2022

Abstract

The use of machine learning for medical data mining is one of most preferable research field in the healthcare field. In the medical health field, there is a large amount of data containing information, and these data will be continuously stored in the database. Using machine learning to mine valuable information from medical data can provide a certain scientific reference for decision-making about patient health. This paper used breast cancer data from SEER (Surveillance of Epidemiology and End Result) which is contributed by National Cancer Institute. The database is a large-scale and open database. The proposed research work first analyzes the breast cancer data set, and then applies data mining methods to evaluate the results. Data mining is used to obtain disease patterns that doctors can effectively use. In order to predict the survival ability of breast cancer patients, this paper proposes an hybrid missing values imputation method that is KNNI + kmeans-GMM to deal with missing values, and four classifiers ( XGBoost, Random Forest, Decision tree, K-nearest neighbor ) are used to established 10-year survival models. The experimental results show that the accuracy of breast cancer survival model can be improved through missing value imputation. KNNI + kmeans-GMM is an effective missing value imputation method, which combines the survival model established by the XGBoost classifier with the best accuracy(0.854) and AUC(0.835). Besides, the accuracy and AUC of the 10-year breast cancer survival model established based on this data and the XGBoost algorithm are 0.847 and 0.818, respectively.

Download


Paper Citation


in Harvard Style

Deng Y. (2022). 10-Year Breast Cancer Survival Prediction Research based on Missing Value Imputation. In Proceedings of the 1st International Conference on Health Big Data and Intelligent Healthcare - Volume 1: ICHIH, ISBN 978-989-758-596-8, pages 332-342. DOI: 10.5220/0011369000003438


in Bibtex Style

@conference{ichih22,
author={Yufang Deng},
title={10-Year Breast Cancer Survival Prediction Research based on Missing Value Imputation},
booktitle={Proceedings of the 1st International Conference on Health Big Data and Intelligent Healthcare - Volume 1: ICHIH,},
year={2022},
pages={332-342},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011369000003438},
isbn={978-989-758-596-8},
}


in EndNote Style

TY - CONF

JO - Proceedings of the 1st International Conference on Health Big Data and Intelligent Healthcare - Volume 1: ICHIH,
TI - 10-Year Breast Cancer Survival Prediction Research based on Missing Value Imputation
SN - 978-989-758-596-8
AU - Deng Y.
PY - 2022
SP - 332
EP - 342
DO - 10.5220/0011369000003438