Outlier Detection in Survival Analysis based on the Concordance C-index

ao Diogo Pinto

, Alexandra M. Carvalho

1,2

and Susana Vinga

PIA, Instituto de Telecomunicac¸

oes, Lisboa, Portugal

DEEC, Instituto Superior T

ecnico, Universidade de Lisboa, Lisboa, Portugal

LAETA, IDMEC, Instituto Superior T

ecnico, Universidade de Lisboa, Lisboa, Portugal

Keywords:

Survival Analysis, Outlier Detection, Robust Regression, Cox Proportional Hazards, Concordance c-index.

Abstract:

Outlier detection is an important task in many data-mining applications. In this paper, we present two para-

metric outlier detection methods for survival data. Both methods propose to perform outlier detection in a

multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of good-

ness of ﬁt. The ﬁrst method is a single-step procedure that presents a delete-1 statistic based on bootstrap

hypothesis, testing for the increase in the concordance c-index. The second method is based on a sequential

procedure that maximizes the c-index of the model using a a greedy one-step-ahead search. Finally, we use

both methods to perform robust estimation for the Cox regression, removing from the regression a fraction of

the data by their measure of outlyingness. Our preliminary results on three different datasets have shown to

improve the estimation of the Cox Regression coefﬁcients and also the model predictive ability.

1 INTRODUCTION

Survival analysis is the ﬁeld that studies time-to-

event data and has become a relevant topic in clinical

and medical research. Usually there are three main

goals when performing survival analysis (David G.

Kleinbaum, Mitchel Klein, 2005): 1) to estimate sur-

vival/hazard functions from the data; 2) to compare

survival/hazard functions between groups of patients;

and 3) to assess the impact of explanatory variables on

patients survival time. Goals 1) and 2) are dealt by re-

curring to non-parametric methods like Kaplan-Meier

and Nelson-Aalen estimators in order to estimate sur-

vival curves. Log-rank tests are commonly used to

compare survival curves. All these methods have

good robustness to the presence of outlying observa-

tions. When modeling the data in relation to explana-

tory variables, the most popular method is the Cox

proportional hazards (Cox, 1972). The robustness of

the Cox regression has shown to be rather weak,with

outlying observations severely affecting the Cox re-

gression coefﬁcients. Concerning robustness, one im-

portant concept is the breakdown point (Donoho and

Huber, 1983; Hampel, 1971), which represents the

fraction of corrupt observations needed to arbitrar-

ily offset the estimation values. It has been pointed

out that Cox partial likelihood estimator has a break-

down point of

(Kalbﬂeisch and Prentice, 2011), this

means that when ﬁtting a Cox regression to n data

points, one single outlier observation is enough to

cause the estimator to take values arbitrarily far from

their true value (Rousseeuw and Leroy, 1987).

Goal 3) will be the focus of this study, in particu-

lar, our goal is to improve the Cox regression estima-

tion by identifying and removing outlying observa-

tions. This way the regression becomes more robust,

thus providing more accurate relationships between

explanatory variables and survival times, along with

improving the global model predictive ability.

2 OUTLIERS IN SURVIVAL DATA

To ﬁx notation, a dataset will be denoted by

,...,X

and Y

,...,Y

with each X

being a

p-dimensional vector of covariates and Y

the corre-

sponding dependent variable value. In survival data,

is very common the occurrence of censoring, i.e., the

event of interest does not always occur for a given in-

dividual during the period of the study. To model cen-

soring, it is common to add a binary variable which

indicates if the event occurred or not.

There are many deﬁnitions of an outlier in the lit-

erature, both mathematical and more informal, as can

be seen more thoroughly in (Ben-Gal, 2005). For ex-

ample (Hawkins, 1980) deﬁnes an outlier as an obser-

Pinto J., Carvalho A. and Vinga S..

Outlier Detection in Survival Analysis based on the Concordance C-index.

DOI: 10.5220/0005225300750082

In Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms (BIOINFORMATICS-2015), pages 75-82

ISBN: 978-989-758-070-3

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

vation that deviates so much from other observations

as to arouse suspicion that it was generated by a dif-

ferent mechanism or (Johnson et al., 1992) that de-

ﬁnes an outlier as an observation in a dataset which

appears to be inconsistent with the remainder of that

set of data. These deﬁnitions provide two different

ways of detecting outliers: the ﬁrst one considers only

the values of X

and Y

, the second. assesses the rela-

tion between them by introducting the notion of the

model’s quality of ﬁt. Of course the second notion of

outlyingness needs a model to deﬁne this relationship

between Y

and X

. The ﬁrst perspective corresponds

to a non-parametric approach to outlier detection, the

second corresponds to a parametric or model-based

perspective and will be the focus of the our proposal.

2.1 Swamping and Masking

Data sets with multiple outliers or clusters of outliers

are subject to masking and swamping effects. Here we

enunciate the following deﬁnitions (Acuna and Ro-

driguez, 2004):

Masking Effect. One outlier masks another outlier

if the second outlier can be considered an outlier

only by itself but not in the presence of the ﬁrst

outlier.

Swamping Effect. One outlier observation swamps

a second observation if the latter can be consid-

ered as an outlier in presence of the ﬁrst but not

by itself.

As seen in (Fischler and Bolles, 1981), these ef-

fects are particularly harmful when developing se-

quential procedures for outlier detection, mainly be-

cause the subset of observations already deleted inﬂu-

ences which observations will be deleted in the sub-

sequent iterations.

2.2 Model-speciﬁc Outlier Detection:

Cox Proportional Hazards

In this paper the model chosen to represent the data

was the Cox proportional hazards due to its simplicity,

good results and great power of interpretability.

Several works have been developed to increase the

robustness of the estimation of the Cox Regression

by performing outlier detection, for example through

residual analysis, estimating the variation in regres-

sion parameters with the removal of a given observa-

tion (Therneau et al., 1990). The outliers can then be

detected by selecting the observations that cause the

largest variation in the parameters upon its removal.

This approach is susceptible to masking and swamp-

ing and also needs the tuning of the outlier or non-

outlier threshold.

In (Farcomeni and Viviani, 2011) outlying obser-

vations are deﬁned as the individuals that have the

smallest contributions to the Cox partial likelihood.

In order to ﬁnd these observations they ﬁrst make a

robust ﬁtting of the Cox regression and then in the

absence of masking, they employ residual analysis as

in (Nardi and Schemper, 1999) to perform outlier de-

tection. The robust ﬁtting is done using an algorithm

that maximizes the maximum partial likelihood. This

maximization is made over all possible subsets of the

trimmed set of observations.

2.3 Concordance C-index

To assess the predictive ability of a survival model, we

will use Harrel’s concordance c-index (Harrell et al.,

1982). It measures the ability of the model to predict

a higher relative risk to an individual whose event oc-

curs ﬁrst. The relative risk is estimated from the out-

put of the model for each individual; in a Cox model

for instance, the relative risk corresponds to the haz-

ard ratio. The c-index is calculated using the follow-

ing procedure:

1. Form all possible pairs of individuals.

2. Omit the pairs whose shorter survival time is cen-

sored and all pairs where both observations are

censored. These are the permissible pairs, being

permissible

its cardinality.

3. To calculate Concordance, for each permissible

pair when T

6= T

: count 1 if the shorter survival

time has higher predicted risk, count 0.5 other-

wise. For T

= T

and both not censored: count

1 if the predicted risks are the same, 0.5 other-

wise; if at least one is censored and it corresponds

to a lower risk, count 1 (0.5, otherwise). Concor-

dance is deﬁned as the sum of all counts for each

permissible pair.

4. The c-index is given by

c-index = Concordance/N

permissible

The c-index is a rank measure, thus it only mea-

sures how well predicted values are concordant with

rank-ordered response variables. For example, the c-

index for two patients with predicted hazard ratios of

0.4 and 0.6 is the same as if the patients had hazard

ratios of 0.1 and 0.9 (Harrell, 2001), it only measures

if the outcome is concordant with the response vari-

ables or not. Thus, unlike measures such as the sum of

squared errors, one observation by itself has a limited

contribution for the overall concordance. This robust-

ness may allow for the maximization of the c-index

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

without worrying if it is being maximized at the cost

of the majority of the data, only to ﬁt better one or a

cluster of outlying observations, as it can happen with

the sum of squared errors (Fischler and Bolles, 1981).

3 METHODS FOR OUTLIER

DETECTION

We propose two novel methods for outlier detection

in survival data based on the concordance index, de-

scribed in sections 3.1 and 3.2. Section 3.3 describes

alternative proposals that will be further used for com-

parison purposes.

The proposed methods make use of an operational

deﬁnition of outlier, deﬁned as an observation that,

when absent from the data, will likely decrease the

prediction error of the ﬁtted model. In a survival set-

ting, this prediction error will be measured recurring

to the concordance c-index, which has the particular-

ity of using the predictive model as a black-box.

3.1 Bootstrap Hypothesis Testing (BHT)

Ideally we would know the underlying distribution of

the observations X

and perform an hypothesis test

about the difference in terms of concordance between

the two distributions. Thus the idea is to perform n hy-

pothesis tests about the concordance variation, one for

each observation i, and sorting the resulting p-values.

The hypothesis tests will be made following the

bootstrap approach (Efron, 1979). Each observation

is considered a discrete random variable hav-

ing a distribution equal to the empirical distribution

given by the original dataset. We will consider n dif-

ferent empirical distributions, each distribution results

from removing each observation i from the original

data and adjust densities in order to sum one. De-

noting by C the concordance c-index and C

original

the

concordance in the original data, distributions Data

represent the adjusted empirical distributions having

P(X = X

,Y = Y

) = 0. The hypothesis test for each

observation is formulated as follows:

: C

Model,(X ,Y)∼Data

≤ C

original

: C

Model,(X ,Y)∼Data

> C

original

Writing C

Model,(X ,Y)∼Data

and δC

= C

−C

original

it is

more useful to reformulate the hypothesis tests as:

: δC

≤ 0

: δC

> 0

The rejection of the null hypothesis given a signiﬁ-

cance level α corresponds to estimate a conﬁdence in-

terval for the values of δC for each distribution Data

if this interval does not contain values less or equal

than zero we can reject the null hypothesis for the sig-

niﬁcance level α, alternatively we can calculate the

test p-value.

These conﬁdence intervals will be computed us-

ing Monte Carlo Bootstrap as explained in (Harrell,

2001), for each observation i the procedure is the

following: 1) produce B bootstrap samples by sam-

pling with replacement n − 1 observations from the

empirical distribution Data

; 2) compute the concor-

dance for each bootstrap sample; 3) the p-value corre-

sponds to the proportion of bootstrap samples having

−C

original

≤ 0.

The number of bootstrap samples B used has

shown to be dependent on the number of individuals

and number of covariates. In our tests the value for B

was iteratively increased until p-values convergence.

Following the same reasoning provided in (Singh

and Xie, 2003), given an outlying observation ξ the

probability that a bootstrap sample does not contain

ξ is approximately (1 −

)

≈

(≈ 37%) as n → ∞.

Thus, each observation will be absent in approxi-

mately 37% of the samples. A low p-value for the

hypothesis test mentioned above, means that the given

observation i improves the concordance c-index in a

systematic way not depending on the cooperation of

any other observation. On the other hand, if one out-

lier is masked by another, the masking outlier will

not be present in approximately 37% of the bootstrap

samples and thus we can expect a multimodal be-

havior for the expected δC. Thus an outlier subject

to masking may not systematically improve concor-

dance (present a high p-value for the hypothesis test)

but if presents multimodality and one of the modes is

relatively high, it is a candidate for an outlier.

To sum up, Bootstrap Hypothesis testing (BHT)

on δC works as follows: for each observation, an hy-

pothesis test by bootstrap is done. The resulting statis-

tics for each observation will be a p-value and the ex-

pected value of δC. The p-value gives us the conﬁ-

dence level to reject the hypothesis that the removal

of the observation causes no increase in the c-index.

Experimentally we veriﬁed that these two values are

correlated. When the p-value is low, the expected δC

is usually very high, the opposite relation has shown

to be weaker. So in order to obtain a 1-dimensional

metric for outlyingness, we consider the observations

with the lowest p-values the more outlying ones.

3.2 One-Step Deletion (OSD)

This method is a sequential procedure for outlier re-

moval. We start with all data and at each itera-

tion of the algorithm, the observation that, when ex-

OutlierDetectioninSurvivalAnalysisbasedontheConcordanceC-index

cluded, causes the largest increase in concordance, is

removed. The resulting subset is interpreted as con-

taining the most outlying observations. This method

is equivalent to do one-step-ahead greedy search for

maximizing the c-index of the model in the data. The

resulting subset of observations, will be considered

the most outlying ones.

3.3 Alternative Methods

Here we present alternative methods for outlier de-

tection in survival data that will be used to assess the

performance of the proposed methods.

3.3.1 Martingale Residuals (MART)

These residuals are provenient from the counting pro-

cess framework for censored survival, ﬁrst a Martin-

gale process is deﬁned by the difference between ob-

served and expected number of events (David W. Hos-

mer, Stanley Lemeshow, Susanne May, 2008). Let

N(t) be the number of events until t and H(t) the cu-

mulative hazard function, we have for each individi-

ual i the Martingale residual process:

(t) = N

(t) − H

(t). (1)

The martingale residual is deﬁned as the value of pro-

cess M

(t) at the time of failure/censoring, as N(t)

takes 1 if the event is observed and zero when cen-

sored (David Collett, 2003), their are given by:

= δ

− H

(t), (2)

where δ

is the censoring indicator for individual i.

For the Cox model the residuals are given by:

= δ

− exp{βX}H

(t). (3)

3.3.2 Deviance Residuals (DEV)

The deviance residuals are an attempt (David Collett,

2003) to adjust the Martingale residuals to be more

centered around zero, given by:

= sgn(r

)[−2{r

+ δ

log(δ

− r

)}]

. (4)

3.3.3 Likelihood Displacement Statistic (LD)

Let

β be the value of β that maximizes the partial Cox

likelihood and

(−i)

the estimate when observation i is

eliminated from the ﬁtting. The likelihood displace-

ment (Cook, 1977) statistic (LD) is given by:

= 2logL(

β) −2logL(

(−i)

). (5)

Under the null hypothesis

(−i)

β the LD statis-

tic follows a chi-square distribution with one degree

of freedom. Therefore we calculate the p-value for

this test for all observations, the ones having more

signiﬁcance are considered the most outlying ones.

4 DATASETS

4.1 Simulation Data (SIM)

Similarly to the simulation data in (Farcomeni and

Viviani, 2011), we will generate datasets having as

underlying probabilistic model, the Cox proportional

hazards. Our goal is to recreate a realistic setting,

with survival times and covariates as similar as real

datasets. In order to approximate this conditions,

each simulated dataset will have a pure model β that

translates a a general trend of the observations, and

two other Cox models with different parameter val-

ues. Each dataset consists in 200 observations hav-

ing covariates X

. These follow a 3-D normal

distribution with zero mean and covariance matrix Σ,

that will be equal to the identity matrix I for the pure

model and σ ·I for the outlier models.

For the survival times, the probabilistic model for

the hazard of each individual follows one of three pos-

sible models: the pure model β and two outlier models

and β

. Having k < n outliers (k even), the hazard

function for each observation i is generated by:

(t) =











(t)exp{βX} 1 ≤ i ≤ n − k

(t)exp{β

X} n −k < i ≤ n − k/2

(t)exp{β

X} n − k/2 < i ≤ n

(6)

The baseline hazard h

(t) is given by a Weibull func-

tion with both shape and scale parameters equal to

unity, deﬁned in the interval from 0 to 1. The value

for k will be set in order to have 10% of outliers.

The estimation of the cumulative hazard function

(t) is then obtained:

(t) =

(τ)dτ. (7)

From each H

(t) we further calculate the corre-

sponding survival curves by S

(t) = e

−H

(t)

. Having

this distribution, we generate 200 survival times ac-

cording to the distribution given by S

(t) and gener-

ate a censoring vector c

,..,c

200

following a Bernoulli

with probability p, corresponding to the proportion of

censored observations, typically a value around 0.2:

∼ 1 −S

(t), (8)

∼ Bernoulli(p).

4.2 Clinical Data

In order to test the procedures in a more realistic set-

ting, we have further applied the methods to real clin-

ical data, focusing on two studies:

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

WHAS. Dataset from the Worcester Heart At-

tack Study, with 100 individuals each with

5 covariates. This data concerns the sur-

vival times of patients having their ﬁrst

heart attack. Data publicly available at

https://www.umass.edu/statdata/statdata/data/.

BMT. Bone Marrow Transplant Data (Klein and

Moeschberger, 1997): contains data about 137

leukemia patients each with 10 covariates. The

data concerns the survival time after the bone mar-

row transplant. It is publicly available in the R (R

Development Core Team, 2006) package KMsurv.

5 RESULTS AND DISCUSSION

In this section we assess the performance of the two

proposed outlier detection methods BHT and OSD

and we compare their results with MART, DEV and

LD. We start by presenting the conﬁguration of our

simulation study for outlier detection. Then we apply

all methods to two real datasets, performing outlier

detection. We further use the detected outliers to per-

form a robust Cox regression by removing them from

the data, the coefﬁcients and p-values of the regres-

sion will be compared.

5.1 SIM Dataset

The outlier detection methods will be used on sim-

ulated datasets generated using the methodology de-

scribed in Section 4.1. In order to test the outlier

detection methods in a variety of conditions for the

outlying models and for the general model, we will

ﬁx the general trend model β = (1,1,1) and then we

deﬁne a set of conﬁgurations for the two sources of

outlying observations. Each parameter for the outlier

sources is given by a three dimensional normal distri-

bution with a diagonal covariance matrix, the values

for the means and variances in each scenario are pre-

sented in Table 1.

Table 1: Tested scenarios for the outlier sources.

Scenario β

1 (-0.5,-0.5,-0.5) (0.5,-0.5,-0.5) 0.25

2 (-2,-2,-2) (-2,2,-2) 0.50

3 (-1,-1,-1) (-1,1,-1) 0.50

4 (0.5,0.5,0.5) (0.5,-0.5,0.5) 0.25

5 (2,2,2) (2,-2,2) 0.50

6 (1,-1,1) (1,1,-1) 0.50

7 (0.8,0.8,-1.6) (-1.6,0.8,0.8) 0.50

8 (0.25,0.25,-0.50) (-0.50,0.25,0.25) 0.10

9 (2,2,2) (-2,-2,-2) 0.50

10 (2,2,2) (-1,-1,-1) 0.50

Although the outlying values for the parameters

may seem close to the general trend model it is worth

noting that the Cox model deﬁnes the hazards as an

exponential function of βX, thus the ratio between

the hazard of an outlying and a general trend observa-

tion is given by exp{β

X −βX}. The reasons behind

the choice of this set of scenarios is to have a variety

of combinations with different norms and contrasting

parameters.

Table 2 reports the accuracy in terms of percent-

age of retrieved outliers or true positive rate. By in-

Table 2: Fraction of true positives averaged over 100 runs

for each method in the 10 chosen scenarios.

Scenario MART DEV LD BHT OSD

1 0.29 0.31 0.39 0.25 0.42

2 0.43 0.49 0.54 0.45 0.62

3 0.35 0.40 0.45 0.39 0.52

4 0.22 0.24 0.27 0.27 0.28

5 0.26 0.25 0.19 0.18 0.13

6 0.22 0.30 0.30 0.20 0.32

7 0.31 0.32 0.33 0.31 0.39

8 0.22 0.26 0.34 0.24 0.32

9 0.25 0.26 0.21 0.22 0.21

10 0.22 0.21 0.17 0.17 0.13

specting Table 2 we see that the OSD algorithm is the

one that has an overall better performance, overcom-

ing the other methods in 6 out of 10 of the scenarios.

For scenarios 5 and 10, MART achieves the best per-

formance.

5.2 WHAS Dataset

The outliers detected by the methods in the WHAS

dataset are presented in Table 3. The selection is

based on the ten lowest p-values.

Table 3: Top 10% outliers detected by the methods in the

WHAS dataset.

Nb. MART DEV LD BHT OSD

1 93 1 97 67 1

2 51 31 67 1 67

3 90 56 1 78 97

4 33 85 52 56 51

5 11 97 23 69 23

6 27 93 7 8 31

7 40 30 57 45 93

8 1 78 78 93 52

9 31 51 56 30 56

10 56 90 17 32 57

It is noteworthy that all the methods identiﬁed ob-

servation 56.The estimates for the regression coefﬁ-

cients when ﬁtting the Cox model to all observations

are given in Table 4.

We observe that only two covariates are statisti-

cally signiﬁcant corresponding to the age at the ﬁrst

OutlierDetectioninSurvivalAnalysisbasedontheConcordanceC-index

Table 4: Cox model ﬁtted to the WHAS dataset.

β p-value

los -0.022 0.3972

age 0.039 0.0025

gender 0.157 0.6066

bmi -0.071 0.0497

hear attack (age) and the body mass index (bmi).

After removing 10% of the observations indicated

in Table 3 for each of the methods, new models are

obtained (Table 5 and Table 6). The goal is to unveil

a trend model, unaffected by outlying observations.

Table 5: Cox estimates removing the top 10% outlier obser-

vations in the WHAS dataset for methods BHT and OSD.

BHT OSD

β p-value β p-value

los -0.166 0.006 -0.025 0.374

age 0.048 0.000 0.068 0.000

gender 0.003 0.992 0.042 0.899

bmi -0.162 0.001 -0.137 0.002

Table 6: Cox estimates removing the top 10% outlier obser-

vations in the WHAS dataset for methods MART, DEV and

LD.

MART DEV LD

β p-value β p-value β p-value

los -0.016 0.498 -0.015 0.550 -0.016 0.506

age 0.045 0.001 0.032 0.012 0.069 0.000

gender -0.082 0.800 0.155 0.653 -0.230 0.483

bmi -0.082 0.029 -0.037 0.030 -0.146 0.001

The results show that in the proposed BHT

method the length of stay (los) after the ﬁrst heart at-

tack appeared as signiﬁcant, which did not occur for

the other methods. These results show that BHT can

potentially unveil covariates that were not considered

useful.

The fact that los rose as a signiﬁcant covariate

in the Cox regression calls for a better analysis of

this measure. There are several studies that relate

the length of hospital stay with patient readmission.

Also studied, is the association between los and the

quality of hospital care, (Thomas et al., 1996) with

data for 12 different conditions, that a longer los risk-

adjusted for other covariates, is associated with poorer

hospital care. In our case we have a negative coefﬁ-

cient, meaning that the hazard function decreases with

a longer length of stay, thus this might be also a po-

tential indicator that the hospital has a good quality of

care.

5.3 BMT Dataset

The outliers detected by the methods in the BMT

dataset are presented in Table 7. The selection is

based, again, on the 10% lowest p-values. For BHT,

a value of bootstrap samples B = 2000 has shown to

be sufﬁcient for the convergence.

Table 7: Top 10% outliers detected by the methods in the

BMT dataset.

Nb. MART DEV LD BHT OSD

1 65 129 129 129 129

2 103 35 132 103 132

3 99 108 89 99 30

4 97 65 90 65 130

5 13 132 26 30 26

6 42 87 30 132 28

7 63 84 28 13 65

8 40 103 130 130 13

9 92 30 17 16 103

10 14 99 105 136 14

11 43 97 136 15 72

12 39 28 116 26 89

13 49 109 72 97 50

The estimates for the regression coefﬁcients when

ﬁtting the Cox model to all observation are given in

Table 8.

Table 8: Cox model ﬁtted to all BMT data.

β p-value

Age Diagn -0.0017 0.9357

Donor Age 0.0316 0.1072

Sex -0.2738 0.2651

Donor Sex 0.0409 0.8662

CMV -0.1701 0.4922

Donor CMV 0.0038 0.9875

Wait Time -0.0001 0.8701

FAB 0.7917 0.0012

Hospital -0.5570 0.0004

MTX 1.0062 0.0026

After removing 10% of the observations indicated

in the Table 7 for each of the methods, new models

are obtained (Table 9 and Table 10).

Table 9: Cox estimates removing the top 10% outlier obser-

vations in the BMT dataset for methods BHT and OSD.

BHT OSD

β p-value β p-value

Age Diagn -0.017 0.418 0.027 0.222

Donor Age 0.033 0.097 0.016 0.432

Sex -0.412 0.115 -0.556 0.029

Donor Sex 0.076 0.780 0.403 0.144

CMV -0.541 0.047 -0.622 0.026

Donor CMV -0.024 0.926 0.116 0.651

Wait Time 0.000 0.623 -0.001 0.472

FAB 1.260 0.000 1.157 0.000

Hospital -0.991 0.000 -1.190 0.000

MTX 2.127 0.000 2.488 0.000

When using all the data, the statistically signiﬁ-

cant covariates are FAB, Hospital and MTX (Table 8).

When the ﬁrst top 10% outlier observations were re-

moved, the results were very similar between the pro-

posed methods BHT and OSD as both reduced the p-

value of the variable CMV to 0.047 and 0.026, respec-

tively. This possibly reveals that the variable CMV is

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms

Table 10: Cox estimates removing the top 10% outlier ob-

servations in the BMT dataset for methods MART, DEV

and LD.

MART DEV LD

β p-value β p-value β p-value

Age Diagn -0.009 0.640 0.029 0.181 0.006 0.777

Donor Age 0.027 0.149 0.024 0.243 0.050 0.027

Sex -0.443 0.078 -0.624 0.021 -0.325 0.235

Donor Sex 0.053 0.833 0.257 0.345 0.361 0.195

CMV -0.356 0.178 -0.460 0.094 -0.395 0.148

Donor CMV -0.432 0.867 0.075 0.771 0.032 0.910

Wait Time -0.000 0.866 0.000 0.321 -0.000 0.586

FAB 1.170 0.000 1.286 0.000 1.058 0.000

Hospital -0.693 0.000 -0.794 0.000 -1.442 0.000

MTX 1.813 0.000 1.495 0.000 2.350 0.000

much more signiﬁcant to the model than ﬁrst expected

using the complete dataset. The covariate CMV repre-

sents the cytomegalovirus immune status (positive or

negative) and therefore might be a relevant feature to

predict survival. It is noteworthy that the other meth-

ods did not retrieve this variable as signiﬁcant.

In all these experiments, the choice of the outlier

percentage threshold has obvious implications on the

obtained Cox regression coefﬁcients and a more de-

tailed analysis is warranted to analyze the tradeoff be-

tween keeping and removing observations.

5.4 Leave-one-Out Cross-validation of

the C-index

To assess the predictive ability of the model when

facing new observations, we perform leave-one-out

cross-validation of the c-index. The outliers also be-

come part of the several test sets, but they are never

present in the training used to estimate the models.

Thus this measure takes into account the prediction

performance of the model on outlying observations.

The results are very positive, with the concordance

showing a systematic increase while removing candi-

date outliers.

Table 11: Leave-one-out estimated c-indexes for the BHT

method.

Dataset All data top-3 top-10 top-30

WHAS 0.6607 0.6813 0.6824 0.6900

BMT 0.6208 0.6314 0.6441 0.6668

Table 12: Leave-one-out estimated c-indexes for the OSD

procedure.

Dataset All data top-3 top-10 top-30

WHAS 0.6607 0.6832 0.6853 0.6986

BMT 0.6208 0.6314 0.6441 0.6629

6 CONCLUSION

We proposed two methods for outlier detection in a

survival setting. Both methods improve the perfor-

mance of the Cox Regression using cross-validation.

Overall, OSD has shown promising results in terms

of p-value improvement of the regression coefﬁcients.

We think BHT can be improved in order to be a 2-D

index possibly using multimodality measures (Singh

and Xie, 2003) to identify the outliers that have a

higher p-value (that do not systematically improve

concordance when removed from the data, but still

are outlying observations).

Finally, we use both methods to perform robust

estimation for the Cox regression, removing from the

regression a fraction of the data by their measure of

outlyingness. Our preliminary results on three differ-

ent datasets have shown to improve the estimation of

the Cox Regression coefﬁcients and also the model

predictive ability.

ACKNOWLEDGEMENTS

This work was supported by national funds through

Fundac¸

ao para a Ci

encia e Tecnologia (FCT, Portu-

gal) under contracts LAETA Pest-OE/EME/LA0022

and IT (PEst-OE/EEI/LA0008/2013), as well as

project CancerSys (EXPL/EMS-SIS/1954/2013). SV

acknowledges support by Programa Investigador

FCT(IF/00653/2012) from FCT, co-funded by the Eu-

ropean Social Fund (ESF) through the Operational

Program Human Potential (POPH).

REFERENCES

Acuna, E. and Rodriguez, C. (2004). A meta analysis study

of outlier detection methods in classiﬁcation. Techni-

cal paper, Department of Mathematics, University of

Puerto Rico at Mayaguez.

Ben-Gal, I. (2005). Outlier detection. In Data Mining

and Knowledge Discovery Handbook, pages 131–146.

Springer.

Cook, R. D. (1977). Detection of inﬂuential observation in

linear regression. Technometrics, pages 15–18.

Cox, D. R. (1972). Regression Models and Life Tables.

Journal of the Royal Statistic Society, B(34):187–202.

David Collett (2003). Modelling survival data in medi-

cal research. Boca Raton, Fla. : Chapman &

Hall/CRC, c2003.

David G. Kleinbaum, Mitchel Klein (2005). Survival anal-

ysis: a self-learning text. New York, NY : Springer,

c2005.

OutlierDetectioninSurvivalAnalysisbasedontheConcordanceC-index

David W. Hosmer, Stanley Lemeshow, Susanne May

(2008). Applied survival analysis: regression mod-

eling of time-to-event data. Hoboken, N.J. : Wiley-

Interscience, c2008.

Donoho, D. L. and Huber, P. J. (1983). The notion of break-

down point. A Festschrift for Erich L. Lehmann, pages

157–184.

Efron, B. (1979). Bootstrap methods: another look at the

jackknife. The annals of Statistics, pages 1–26.

Farcomeni, A. and Viviani, S. (2011). Robust estimation for

the cox regression model based on trimming. Biomet-

rical Journal, 53(6):956–973.

Fischler, M. and Bolles, R. (1981). Random Sample Con-

sensus: A Paradigm for Model Fitting with Applica-

tions to Image Analysis and Automated Cartography.

Communications of the ACM.

Hampel, F. R. (1971). A general qualitative deﬁnition of

robustness. The Annals of Mathematical Statistics,

pages 1887–1896.

Harrell, F. E. (2001). Regression modeling strategies: with

applications to linear models, logistic regression, and

survival analysis. Springer.

Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L., and

Rosati, R. A. (1982). Evaluating the yield of medical

tests. Jama, 247(18):2543–2546.

Hawkins, D. M. (1980). Identiﬁcation of outliers, vol-

ume 11. Springer.

Johnson, R. A., Wichern, D. W., and Education, P. (1992).

Applied multivariate statistical analysis, volume 4.

Prentice hall Englewood Cliffs, NJ.

Kalbﬂeisch, J. D. and Prentice, R. L. (2011). The statistical

analysis of failure time data, volume 360. John Wiley

& Sons.

Klein, J. and Moeschberger, M. (1997). Survival analysis:

techniques for censored and truncated regression.

Nardi, A. and Schemper, M. (1999). New residuals for cox

regression and their application to outlier screening.

Biometrics, 55(2):523–529.

R Development Core Team (2006). R: A Language and

Environment for Statistical Computing. R Foundation

for Statistical Computing, Vienna, Austria. ISBN 3-

900051-07-0.

Reid, N. and Cr

epeau, H. (1985). Inﬂuence functions for

proportional hazards regression. Biometrika, 72(1):1–

Rousseeuw, P. and Leroy, A. (1987). Robust regression

and outlier detection. Wiley Series in probability and

mathematical statistics. Wiley, New York [u.a.].

Singh, K. and Xie, M. (2003). Bootlier-Plot: Bootstrap

Based Outlier Detection Plot. Sankhy

a: The Indian

Journal of Statistics (2003-2007), 65(3):532–559.

Therneau, T. M., Grambsch, P. M., and Fleming, T. R.

(1990). Martingale-based residuals for survival mod-

els. Biometrika, 77(1):147–160.

Thomas, J. W., Guire, K. E., and Horvat, G. G. (1996). Is

patient length of stay related to quality of care? Hospi-

tal & health services administration, 42(4):489–507.

BIOINFORMATICS2015-InternationalConferenceonBioinformaticsModels,MethodsandAlgorithms