# Outlier Detection in Survival Analysis based on the Concordance C-index

### João Diogo Pinto, Alexandra M. Carvalho, Susana Vinga

#### Abstract

Outlier detection is an important task in many data-mining applications. In this paper, we present two parametric outlier detection methods for survival data. Both methods propose to perform outlier detection in a multivariate setting, using the Cox regression as the model and the concordance c-index as a measure of goodness of fit. The first method is a single-step procedure that presents a delete-1 statistic based on bootstrap hypothesis, testing for the increase in the concordance c-index. The second method is based on a sequential procedure that maximizes the c-index of the model using a a greedy one-step-ahead search. Finally, we use both methods to perform robust estimation for the Cox regression, removing from the regression a fraction of the data by their measure of outlyingness. Our preliminary results on three different data sets have shown to improve the estimation of the Cox Regression coefficients and also the model predictive ability.

#### References

- Acuna, E. and Rodriguez, C. (2004). A meta analysis study of outlier detection methods in classification. Technical paper, Department of Mathematics, University of Puerto Rico at Mayaguez.
- Ben-Gal, I. (2005). Outlier detection. In Data Mining and Knowledge Discovery Handbook, pages 131-146. Springer.
- Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics, pages 15-18.
- Cox, D. R. (1972). Regression Models and Life Tables. Journal of the Royal Statistic Society, B(34):187-202.
- David Collett (2003). Modelling survival data in medical research. Boca Raton, Fla. : Chapman & Hall/CRC, c2003.
- David G. Kleinbaum, Mitchel Klein (2005). Survival analysis: a self-learning text. New York, NY : Springer, c2005.
- David W. Hosmer, Stanley Lemeshow, Susanne May (2008). Applied survival analysis: regression modeling of time-to-event data. Hoboken, N.J. : WileyInterscience, c2008.
- Donoho, D. L. and Huber, P. J. (1983). The notion of breakdown point. A Festschrift for Erich L. Lehmann, pages 157-184.
- Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, pages 1-26.
- Farcomeni, A. and Viviani, S. (2011). Robust estimation for the cox regression model based on trimming. Biometrical Journal, 53(6):956-973.
- Fischler, M. and Bolles, R. (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM.
- Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, pages 1887-1896.
- Harrell, F. E. (2001). Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer.
- Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L., and Rosati, R. A. (1982). Evaluating the yield of medical tests. Jama, 247(18):2543-2546.
- Hawkins, D. M. (1980). Identification of outliers, volume 11. Springer.
- Johnson, R. A., Wichern, D. W., and Education, P. (1992). Applied multivariate statistical analysis, volume 4. Prentice hall Englewood Cliffs, NJ.
- Kalbfleisch, J. D. and Prentice, R. L. (2011). The statistical analysis of failure time data, volume 360. John Wiley & Sons.
- Klein, J. and Moeschberger, M. (1997). Survival analysis: techniques for censored and truncated regression.
- Nardi, A. and Schemper, M. (1999). New residuals for cox regression and their application to outlier screening. Biometrics, 55(2):523-529.
- R Development Core Team (2006). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3- 900051-07-0.
- Reid, N. and Crépeau, H. (1985). Influence functions for proportional hazards regression. Biometrika, 72(1):1- 9.
- Rousseeuw, P. and Leroy, A. (1987). Robust regression and outlier detection. Wiley Series in probability and mathematical statistics. Wiley, New York [u.a.].
- Singh, K. and Xie, M. (2003). Bootlier-Plot: Bootstrap Based Outlier Detection Plot. Sankhya¯: The Indian Journal of Statistics (2003-2007), 65(3):532-559.
- Therneau, T. M., Grambsch, P. M., and Fleming, T. R. (1990). Martingale-based residuals for survival models. Biometrika, 77(1):147-160.
- Thomas, J. W., Guire, K. E., and Horvat, G. G. (1996). Is patient length of stay related to quality of care? Hospital & health services administration, 42(4):489-507.

#### Paper Citation

#### in Harvard Style

Pinto J., Carvalho A. and Vinga S. (2015). **Outlier Detection in Survival Analysis based on the Concordance C-index** . In *Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)* ISBN 978-989-758-070-3, pages 75-82. DOI: 10.5220/0005225300750082

#### in Bibtex Style

@conference{bioinformatics15,

author={João Diogo Pinto and Alexandra M. Carvalho and Susana Vinga},

title={Outlier Detection in Survival Analysis based on the Concordance C-index},

booktitle={Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)},

year={2015},

pages={75-82},

publisher={SciTePress},

organization={INSTICC},

doi={10.5220/0005225300750082},

isbn={978-989-758-070-3},

}

#### in EndNote Style

TY - CONF

JO - Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms - Volume 1: BIOINFORMATICS, (BIOSTEC 2015)

TI - Outlier Detection in Survival Analysis based on the Concordance C-index

SN - 978-989-758-070-3

AU - Pinto J.

AU - Carvalho A.

AU - Vinga S.

PY - 2015

SP - 75

EP - 82

DO - 10.5220/0005225300750082