without worrying if it is being maximized at the cost
of the majority of the data, only to fit better one or a
cluster of outlying observations, as it can happen with
the sum of squared errors (Fischler and Bolles, 1981).
3 METHODS FOR OUTLIER
DETECTION
We propose two novel methods for outlier detection
in survival data based on the concordance index, de-
scribed in sections 3.1 and 3.2. Section 3.3 describes
alternative proposals that will be further used for com-
parison purposes.
The proposed methods make use of an operational
definition of outlier, defined as an observation that,
when absent from the data, will likely decrease the
prediction error of the fitted model. In a survival set-
ting, this prediction error will be measured recurring
to the concordance c-index, which has the particular-
ity of using the predictive model as a black-box.
3.1 Bootstrap Hypothesis Testing (BHT)
Ideally we would know the underlying distribution of
the observations X
i
,Y
i
and perform an hypothesis test
about the difference in terms of concordance between
the two distributions. Thus the idea is to perform n hy-
pothesis tests about the concordance variation, one for
each observation i, and sorting the resulting p-values.
The hypothesis tests will be made following the
bootstrap approach (Efron, 1979). Each observation
X
i
,Y
i
is considered a discrete random variable hav-
ing a distribution equal to the empirical distribution
given by the original dataset. We will consider n dif-
ferent empirical distributions, each distribution results
from removing each observation i from the original
data and adjust densities in order to sum one. De-
noting by C the concordance c-index and C
original
the
concordance in the original data, distributions Data
i
represent the adjusted empirical distributions having
P(X = X
i
,Y = Y
i
) = 0. The hypothesis test for each
observation is formulated as follows:
H
0
: C
Model,(X ,Y)∼Data
i
≤ C
original
H
1
: C
Model,(X ,Y)∼Data
i
> C
original
Writing C
Model,(X ,Y)∼Data
i
and δC
i
= C
i
−C
original
it is
more useful to reformulate the hypothesis tests as:
H
0
: δC
i
≤ 0
H
1
: δC
i
> 0
The rejection of the null hypothesis given a signifi-
cance level α corresponds to estimate a confidence in-
terval for the values of δC for each distribution Data
i
,
if this interval does not contain values less or equal
than zero we can reject the null hypothesis for the sig-
nificance level α, alternatively we can calculate the
test p-value.
These confidence intervals will be computed us-
ing Monte Carlo Bootstrap as explained in (Harrell,
2001), for each observation i the procedure is the
following: 1) produce B bootstrap samples by sam-
pling with replacement n − 1 observations from the
empirical distribution Data
i
; 2) compute the concor-
dance for each bootstrap sample; 3) the p-value corre-
sponds to the proportion of bootstrap samples having
C
i
−C
original
≤ 0.
The number of bootstrap samples B used has
shown to be dependent on the number of individuals
and number of covariates. In our tests the value for B
was iteratively increased until p-values convergence.
Following the same reasoning provided in (Singh
and Xie, 2003), given an outlying observation ξ the
probability that a bootstrap sample does not contain
ξ is approximately (1 −
1
n
)
n
≈
1
e
(≈ 37%) as n → ∞.
Thus, each observation will be absent in approxi-
mately 37% of the samples. A low p-value for the
hypothesis test mentioned above, means that the given
observation i improves the concordance c-index in a
systematic way not depending on the cooperation of
any other observation. On the other hand, if one out-
lier is masked by another, the masking outlier will
not be present in approximately 37% of the bootstrap
samples and thus we can expect a multimodal be-
havior for the expected δC. Thus an outlier subject
to masking may not systematically improve concor-
dance (present a high p-value for the hypothesis test)
but if presents multimodality and one of the modes is
relatively high, it is a candidate for an outlier.
To sum up, Bootstrap Hypothesis testing (BHT)
on δC works as follows: for each observation, an hy-
pothesis test by bootstrap is done. The resulting statis-
tics for each observation will be a p-value and the ex-
pected value of δC. The p-value gives us the confi-
dence level to reject the hypothesis that the removal
of the observation causes no increase in the c-index.
Experimentally we verified that these two values are
correlated. When the p-value is low, the expected δC
is usually very high, the opposite relation has shown
to be weaker. So in order to obtain a 1-dimensional
metric for outlyingness, we consider the observations
with the lowest p-values the more outlying ones.
3.2 One-Step Deletion (OSD)
This method is a sequential procedure for outlier re-
moval. We start with all data and at each itera-
tion of the algorithm, the observation that, when ex-
OutlierDetectioninSurvivalAnalysisbasedontheConcordanceC-index
77