Authors:
João Diogo Pinto
1
;
Alexandra M. Carvalho
2
and
Susana Vinga
3
Affiliations:
1
Instituto de Telecomunicações, Portugal
;
2
Instituto de Telecomunicações, Instituto Superior Técnico and Universidade de Lisboa, Portugal
;
3
Instituto Superior Técnico and Universidade de Lisboa, Portugal
Keyword(s):
Survival analysis, Outlier Detection, Robust Regression, Cox Proportional Hazards, Concordance C-index
Related
Ontology
Subjects/Areas/Topics:
Bioinformatics
;
Biomedical Engineering
;
Biostatistics and Stochastic Models
;
Data Mining and Machine Learning
;
Pattern Recognition, Clustering and Classification
;
Systems Biology
Abstract:
Outlier detection is an important task in many data-mining applications. In this paper, we present two parametric outlier detection methods for survival data. Both methods propose to perform outlier detection in a multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness of fit. The first method is a single-step procedure that presents a delete-1 statistic based on bootstrap hypothesis, testing for the increase in the concordance c-index. The second method is based on a sequential procedure that maximizes the c-index of the model using a a greedy one-step-ahead search. Finally, we use both methods to perform robust estimation for the Cox regression, removing from the regression a fraction of the data by their measure of outlyingness. Our preliminary results on three different data sets have shown to improve the estimation of the Cox Regression coefficients and also the model predictive ability.