combination whose AUC is 0.835 greater than other
combinations.
This study considers the use of all data
information as much as possible for survival
modeling, and does not consider whether certain
features are related to label. If a column of features
and labels are not very relevant, then data imputation
for this miss data will increase data noise. Therefore,
in the following research, we will explore the
importance of features in more depth in the future.
REFERENCES
A. Burton and D. Altman, “Missing covariate data within
cancer prognostic studies: a review of current
reporting and proposed guidelines,” British journal of
cancer, vol. 91, no. 1, pp. 4–8, 2004.
A. K. Waljee, A. Mukherjee, A. G. Singal, Y. Zhang, J.
Warren, U. Balis, J. Marrero, J. Zhu, and P. D.
Higgins, “Comparison of imputation methods for
missing laboratory data in medicine,” BMJ open, vol.
3, no. 8, p. e002847, 2013.
B. U. Wu, R. S. Johannes, X. Sun, Y. Tabak, D. L.
Conwell, and P. A. Banks, “The early prediction of
mortality in acute pancreatitis: a large population-
based study,” Gut, vol. 57, no. 12, pp. 1698–1703,
2008.
B. Zheng, S. W. Yoon, and S. S. Lam, “Breast cancer
diagnosis based on feature extraction using a hybrid of
k-means and support vector machine algorithms,”
Expert Systems with Applications, vol. 41, no. 4, Part
1, pp. 1476–1482, 2014.
D. B. Rubin, “Inference and missing data,” Biometrika,
vol. 63, no. 3, pp. 581–592, 1976.
D. B. Rubin, “Multiple imputations in sample surveys-a
phenomenological bayesian approach to nonresponse,”
in Proceedings of the survey research methods section
of the American Statistical Association, vol. 1, pp. 20–
34, American Statistical Association, 1978.
D. B. Rubin, Multiple imputation for nonresponse in
surveys, vol. 81. John Wiley & Sons, 2004.
D. Delen, G. Walker, and A. Kadam, “Predicting breast
cancer survivability: a comparison of three data
mining methods,” Artificial Intelligence In Medicine,
vol. 2, no. 34, pp. 113–127, 2005.
D. J. Stekhoven and P. Bühlmann, “Missforest—non-
parametric missing value imputation for mixed-type
data,” Bioin- formatics, vol. 28, no. 1, pp. 112–118,
2012.
D. J. Stekhoven, “missforest: Nonparametric missing
value imputation using random forest,” 2013.
E. Y. Kibis, “Data analytics approaches for breast cancer
survivability: comparison of data mining methods,” in
IIE Annual Conference. Proceedings, pp. 591–596,
Institute of Industrial and Systems Engineers (IISE),
2017.
G. Kabir, S. Tesfamariam, J. Hemsing, and R. Sadiq,
“Handling incomplete and missing data in water
network database using imputation methods,”
Sustainable and Resilient Infrastructure, vol. 5, no. 6,
pp. 365–377, 2020.
H. L. Afshar, M. Ahmadi, M. Roudbari, and F. Sadoughi,
“Prediction of breast cancer survival through
knowledge discovery in databases,” Global journal of
health science, vol. 7, no. 4, p. 392, 2015.
H. Miao, M. Hartman, N. Bhoo-Pathy, S.-C. Lee, N. A.
Taib, E.-Y. Tan, P. Chan, K. G. Moons, H.-S. Wong,
J. Goh, et al., “Predicting survival of de novo
metastatic breast cancer in asian women: systematic
review and validation study,” PLoS One, vol. 9, no. 4,
p. e93755, 2014.
H. Migdady and M. M. Al-Talib, “An enhanced fuzzy k-
means clustering with application to missing data
imputation,” Electronic Journal of Applied Statistical
Analysis
, vol. 11, no. 2, pp. 674–686, 2018.
J. L. Schafer and J. W. Graham, “Missing data: our view
of the state of the art.,” Psychological methods, vol. 7,
no. 2, p. 147, 2002.
J. M. Jerez, I. Molina, P. J. García-Laencina, E. Alba, N.
Ribelles, M. Martín, and L. Franco, “Missing data
imputation using statistical and machine learning
methods in a real breast cancer problem,” Artificial
Intelligence in Medicine, vol. 50, no. 2, pp. 105–115,
2010.
K. Maheswari, P. P. A. Priya, S. Ramkumar, and M. Arun,
“Missing data handling by mean imputation method
and statistical analysis of classification algorithm,” in
EAI International Conference on Big Data Innovation
for Sustainable Cognitive Computing, pp. 137–149,
Springer, 2020.
M. Di Zio, U. Guarnera, and O. Luzi, “Imputation through
finite gaussian mixture models,” Computational
Statistics & Data Analysis, vol. 51, no. 11, pp. 5305–
5316, 2007. Advances in Mixture Models.
M. Di Zio, U. Guarnera, and O. Luzi, “Imputation through
finite gaussian mixture models,” Computational
Statistics & Data Analysis, vol. 51, no. 11, pp. 5305–
5316, 2007.
M. G. Rahman and M. Z. Islam, “Missing value
imputation using a fuzzy clustering-based em
approach,” Knowledge and Information Systems, vol.
46, no. 2, pp. 389–422, 2016.
M. M. L. A. K. D. A. G. S. A. J. R. L. Carol E, Jiemin,
“Breast cancer facts & figures 2019-2020,” CA: A
Cancer Journal for Clinicians, vol. 69, no. 6, pp. 438–
451, 2019.
M. Naghizadeh and N. Habibi, “A model to predict the
survivability of cancer comorbidity through ensemble
learning approach,” Expert Systems, vol. 36, no. 3, p.
e12392, 2019.
M. Vazifehdan, M. H. Moattar, and M. Jalali, “A hybrid
bayesian network and tensor factorization approach for
missing value imputation to improve breast cancer
recurrence prediction,” Journal of King Saud
University-Computer and Information Sciences, vol.
31, no. 2, pp. 175–184, 2019.
N. Rathore, D. Tomar, and S. Agarwal, “Predicting the