Authors:
Iulii Vasilev
;
Mikhail Petrovskiy
and
Igor Mashechkin
Affiliation:
Computer Science Department of Lomonosov Moscow State University, MSU, Vorobjovy Gory, Moscow, Russia
Keyword(s):
Machine Learning, Survival Analysis, Cox Proportional Hazards, Survival Decision Trees, Weighted Log- rank Split Criteria, Bagging and Boosting Ensembles.
Abstract:
Survival Analysis is an important tool to predict time-to-event in many applications, including but not limited to medicine, insurance, manufacturing and others. The state-of-the-art statistical approach is based on Cox proportional hazards. Though, from a practical point of view, it has several important disadvantages, such as strong assumptions on proportional over time hazard functions and linear relationship between time independent covariates and the log hazard. Another technical issue is an inability to deal with missing data directly. To overcome these disadvantages machine learning survival models based on recursive partitioning approach have been developed recently. In this paper, we propose a new survival decision tree model that uses weighted log-rank split criteria. Unlike traditional log-rank criteria the weighted ones allow to give different priority to events with different time stamps. It works with missing data directly while searching the best splitting point, its s
ize is controlled by p-value threshold with Bonferroni adjustment and quantile based discretization is used to decrease the number of potential candidates for splitting points. Also, we investigate how to improve the accuracy of the model with bagging ensemble of the proposed decision tree models. We introduce an experimental comparison of the proposed methods against Cox proportional risk regression and existing tree-based survival models and their ensembles. According to the obtained experimental results, the proposed methods show better performance on several benchmark public medical datasets in terms of Concordance index and Integrated Brier Score metrics.
(More)