Authors:
Sebastian Appelbaum
1
;
Daniel Krüerke
2
;
3
;
Stephan Baumgartner
2
;
Marianne Schenker
3
and
Thomas Ostermann
1
Affiliations:
1
Department of Psychology and Psychotherapy, Faculty of Health, Witten/Herdecke University, Witten, Germany
;
2
Society for Cancer Research, Hiscia Institute, Arlesheim, Switzerland
;
3
Clinic Arlesheim, Research Department, Arlesheim, Switzerland
Keyword(s):
Clinical Registry, Cancer Staging, Missing Values, Prediction Models, Integrative Oncology.
Abstract:
Cancer is still a fatal disease in many cases, despite intensive research into prevention, treatment and follow-up. In this context, an important parameter is the stage of the cancer. The TNM/UICC classification is an important method to describe a cancer. It dates back to the surgeon Pierre Denoix and is an important prognostic factor for patient survival. Unfortunately, despite its importance, the TNM/UICC classification is often poorly documented in cancer registries. The aim of this work is to investigate the possibility of predicting UICC stages using statistical learning methods based on cancer registry data. Data from the Cancer Registry Clinic Arlesheim (CRCA) were used for this analysis. It contains a total of 5,305 records of which 1,539 cases were eligible for data analysis. For prediction classification and regression trees, random forests, gradient tree boosting and logistic regression are used as statistical methods for the problem at hand. As performance measures Mean
misclassification error (mmce), area under the receiver operating curve (AUC) and Cohen’s kappa are applied. Misclassification rates were in the range of 28.0% to 30.4%. AUCs ranged between 0.73 and 0.80 and Cohen kappa showed values between 0.39 and 0.44 which only show a moderate predictive performance. However, with only 1,539 records, the data set considered here was significantly lower than those of larger cancer registries, so that the results found here should be interpreted with caution.
(More)