FAILURE PREDICTION USING THE COX PROPORTIONAL

HAZARD MODEL

Pekka Abrahamsson, Ilenia Fronza and Jelena Vlasenko

Free University of Bolzano-Bozen, Piazza Domenicani, Domenikanerplatz 3, I-39100 Bolzano-Bozen, Italy

Keywords: Failure prediction, Cox PH model, Log files.

Abstract: Crashes of software systems may have disruptive, and sometimes tragic effects on users. Being able to

forecast such failures is extremely important, even when the failures are inevitable – at least recovery or

rescue actions can be taken. In this paper we present a technique to predict the failure of running software

systems. We propose to use log messages to predict failures running devices that read log files of running

application and warns about the likely failure of the system; the prediction is based on the Cox Proportional

Hazards (PH) model that has been applied successfully in various fields of research. We perform an initial

validation of the proposed approach on real-world data.

1 INTRODUCTION

Crashes of software systems may have disruptive,

and sometimes tragic effects on users. Being able to

forecast such failures is extremely important, even

when the failures are inevitable – at least recovery or

rescue actions can be taken. We propose in this

paper a method to predict the failure of running

software systems. Often, when developing a

software system, developers write log messages to

track its actual execution path, to debug it, or to

optimize its execution. Our idea proposes to use

such messages to predict the future failures. The

actualization of such idea will set the path for the

development of devices that read logs of running

applications and signal the likely crash of such

systems.

Methods for the prediction of a failure of systems

based on events (in our cases, the log messages)

have been proposed in various engineering

disciplines. These methods can be classified into

design-based methods and data-driven rule-based

methods. In a design-based method, the expected

event sequence is obtained from the system design

and is compared with the observed event sequence

(Sampath et al., 1994; Srinivasan and Jafari, 1993;

Pandalai and Holloway, 2000). The major

disadvantage of these methods is that in many cases,

events occur randomly and thus there is no system

logic design information available. Data-driven rule

based methods do not require system logic design

information. These methods are made of two phases:

1) identification of temporal patterns, i.e., sequences

of events that frequently occur (Mannila et al.,

1997), and 2) development of prediction rules based

on these patterns (Li et al., 2007).

In this work we propose to use the Cox PH

model. The Cox model has been applied mainly in

biomedicine, often for the study of cancer survival

(Bøvelstad et al., 2007; Hao et al., 2009; Yanaihara

et al., 2006; Yu et al., 2008). It has also been applied

successfully in various fields of research, such as

criminology (Benda, 2005; Schmidt and Witte,

1989), sociology (Agerbo, 2007; Sherkat and

Ellison, 2007), marketing (Barros and Machado,

2010; Chen et al., 2009). There are limited uses of

the Cox PH model in cybernetic (Li et al., 2007) and

also an application to software data (Wendel et al.,

2008) where, using as input code metrics, failure

time data coming from bug report were analysed.

The paper is structured as follows: In Section 2,

we present the Cox PH model; in Section 3, we

introduce our approach and we discuss a sample

application of it. In Section 4, we discuss our results.

2 THE COX PROPORTIONAL

HAZARD MODEL

Cox PH model (Cox, 1972) gives an expression for

201

Abrahamsson P., Fronza I. and Vlasenko J..

FAILURE PREDICTION USING THE COX PROPORTIONAL HAZARD MODEL.

DOI: 10.5220/0003557802010206

In Proceedings of the 6th International Conference on Software and Database Technologies (ICSOFT-2011), pages 201-206

ISBN: 978-989-8425-77-5

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

the hazard at time t for an individual i with a given

specification of p covariates x:

(1)

The Cox model formula says that the hazard at

time t is the product of two quantities. The first of

them, h

(t), is called the baseline hazard function

and is equal for all individuals; it may be considered

as a starting version of the hazard function, prior to

considering any of the x´s. Cox PH model focuses

on estimating regression coefficients ß´s leaving the

baseline hazard unspecified. ß is a vector of

regression coefficients; in the p < n setting, ß´s are

estimated by maximizing the log partial likelihood,

which is given by:

(2)

Where R(t

) is the risk set at time t

, i.e. the set of

all individuals who are still under study just prior to

time t

A parametric survival model is one in which

survival time (the outcome) is assumed to follow a

known distribution. The Cox PH model is not a fully

parametric model; rather it is a semi-parametric

model because even if the regression parameters ß´s

are known, the distribution of the outcome remains

unknown. The Cox PH model is a “robust” model,

since the results obtained from it closely

approximate the results of the correct parametric

model.

The key assumption of the Cox PH model is

proportional hazards; this assumption means that the

hazard ratio (defined as the hazard for one individual

over the hazard for a different individual) is constant

over time.

Cox PH model is widely used because of its

characteristics: 1) even without specifying h

(t), it is

possible to find the ß´s, 2) no particular form of

probability distribution is assumed for survival

times, and 3) it uses more information – the survival

times – than the logistic model, which considers a

(0,1) outcome and ignores survival times and

censoring. Therefore it is preferred over the logistic

model when survival time is available and there is

censoring (Kalbfleisch and Prentice, 2002).

3 THE PROPOSED APPROACH

The approach proposed in this work is a technique to

predict the failure of a running software systems

using log files. The idea is to develop devices that

read logs of running applications and signal the

likely crash of such systems. In this Section, we

describe the structure of the approach and of the

monitoring process, and we show the results of the

sample applications.

3.1 Structure of the Approach

Figure 1 presents a schematic view of the proposed

approach.

Figure 1: Schema of the devices that read log files of a

running system and signal the likely failure.

While the system is running, log data are

collected to track the actual execution path (Coman

and Sillitti, 2007; Coman et al., 2009; Moser et al.,

2006; Scotto et al., 2004; Scotto et al., 2006; Sillitti

et al., 2003; Sillitti et al., 2004). In this work, we

look at the running system as a “black box”,

meaning that we do not have any other information

about the system except the log files.

The monitoring process takes log data as input,

basing on the analysis performed, gives to the

supervisor a message indicating the “likely failure”

for the running application.

The supervisor can act directly on the running

system to avoid the predicted failure, or send an alert

to the outside world. Possible actions could be to

abort the running system, to restart it, to dynamically

load components, or to inform the running system if

it was a suitably structured autonomic system

(Müller et al., 2009). Thus waste of time may be

reduced (Sillitti and Succi, 2005).

3.2 Structure of the Monitoring

Process

The monitoring process is based on the Cox PH

model; we chose this model because in our type of

data:

Ó Survival time is available;

Ó Censoring is present.

An advantage of Cox PH model is that no

assumption of a parametric distribution for the event

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

202

sequence data is needed, which could result in the

discovery of information that may be hidden by the

assumption of a specific distribution (Yu et al.,

2008); results comparable to the parametric model

are obtained even without this assumption

(Kalbfleisch and Prentice, 2002).

3.2.1 Dimensional Reduction

of the Problem and Data Preparation

As first, the monitoring process performs an

automatic pre-processing phase (Zheng et al., 2009)

to get temporal event sequences from raw logs of the

application. This system works as follows:

1. data are parsed to extract operations together

with their associated time stamps and severities

for each event in the log file;

2. duplicate rows are deleted together with logs

that are missing information in one or several of

the fields Operation, Time stamp, Severity;

3. sequences of activities are extracted: a new

sequence starts either if there is a ‘Log in’

operation or if the day changes.

Failures are defined as sequences containing at

least one severity “Error”.

Table 1 summarizes the definitions used in this

work.

Table 1: Definitions used in this work.

Notion Definition

Environment Application

Sequence

A chronologically ordered set of log

entries in the maximum time frame of 1

day. Two sequences are separated by a

“Log in” operation

Failure

A sequence containing at least one

severity “Error”

Afterwards, the monitoring process prepares the

input for the Cox PH model (Table 2). Each

sequence i is described by (xi, ti, di), where: 1) xi =

(xi1, ..., xip) and xij is the multiplicity of operation j

in sequence i, 2) ti is the lifetime of the sequence,

defined as the difference between the last time stamp

and the first time stamp of sequence i, and 3) di = 1

when the event is “observed” (failure sequences) and

di = 0 elsewhere (censored observations). Our type

of censoring is type II with a percentage of 100%

(Lee and Wang, 2003), meaning that di is never

equal to zero because of the end of the observation

period.

3.2.2 Training of the Model and Analysis of

the Results

Following the guidelines of (Hao et al., 2009;

Yanaihara et al., 2006; Yu et al., 2008), the training

includes the following steps:

1. The Schoenfeld test (Hosmer et al., 2008;

Kalbfleisch and Prentice, 2002; Kleinbaum

and Klein, 2005) is applied to select the

operations satisfying the PH assumption.

2. The Cox PH model is applied and operations

that are significantly associated to failures are

identified.

3. For each sequence a risk score is evaluated

according to the exponential value of a linear

combination of the multiplicity of the

operation, weighted by the regression

coefficients derived from the aforementioned

Cox PH model.

4. The following values are extracted: 1) m, the

third quartile of risk scores of non failure

sequences, and 2) M, maximum risk score of

non failure sequences.

5. The risk score RS is then evaluated for the

actual sequence of the running application, as

in point 4. One of the following messages is

given as output to the supervisor about the

running application:

i. “likely no failure” if RS ≤ m,

ii. “likely failure” if RS ≥ M, and

iii. “still unknown” if m < RS < M.

3.3 Sample Application

To assess the suitability of our approach, we have

tested it with real-world data. We use log files

collected during approximately 3 months of work in

an important Italian company that prefers to remain

anonymous.

The dataset was prepared using the pre-

processing phase of the monitoring process

presented in Section 3.2.

Sequences were randomly assigned to training

set (60%) or test set (40%).

Table 3 contains the summary of the results of

pre-processing the training set.

Table 2: Training set pre-processing summary.

Type of event n %

Cases available in

the analysis

Failures* 28 12.8

Censored 157 72.0

Total 185 84.9

Cases dropped Censored before

the earlier event

in stratum

33 15.1

total 218 100

* Dependent variable: survival time

FAILURE PREDICTION USING THE COX PROPORTIONAL HAZARD MODEL

203

Six out of the eight initial operations were

satisfying the proportional hazards assumption and

were therefore kept in the input dataset for the Cox

PH model. Table 4 contains the output of this model.

Table 3: Output of the Cox PH model on training set.

β sig exp(β)

Operation 1 -0.40 0.013 0.961

Operation 2 0.06 0.015 1.006

Operation 3 -0.52 0.050 0.592

-2 Log Likelihood: 200.112

Three-operations signature risk scores were

calculated for all the sequences in the test set. The

comparison between failures and non failures shows

that higher risk scores have been assigned to failure

sequences (Figure 2). So, altogether we obtained a

value of m = 0.59 and M = 1.85.

Figure 2: Risk scores in failure and non failure sequences.

In the test set, the comparison of the risk scores

with m and M gives the following results: in 40 % of

the cases our approach is able to predict correctly the

failure and only in 1% of the cases a predicted

failure is not a failure; this means that a message of

expected failure is quite reliable. On the contrary,

the prediction of non failures is not as reliable: 48 %

of the failing sequences are predicted as non failing.

Altogether, the results from this analysis appear

quite interesting.

4 CONCLUSIONS

In this work we propose to develop devices that read

logs of running applications and warn the supervisor

about the likely failure.

Results show that higher risk scores are assigned

to failure sequences in the test set; 40% of failures

are correctly identified.

These devices are intended to become an

incremental failure prediction tool which is built

after each end of a sequence of operations and uses

data from previous iterations to refine itself at every

iteration.

Our goal now is to study more in-depth our

promising model to determine if we can generalize

our results. To this end, we plan to replicate the

analysis on more industrial datasets.

Another aspect that we will evaluate is the

possibility of predicting the occurrence of a failure

analysing only an initial portion of a sequence, so

that there could be an early estimation of failure,

providing additional time to take corrective actions.

We are also considering additional models to see

if we can achieve higher levels of precisions:

Ó Cox PH model with strata to analyse

covariates not satisfying the PH

assumption;

Ó specific techniques to manage datasets with

a limited number of cases (Bøvelstad,

2010).

Finally, we are now investigating how we could

consider other “black-box” properties or applications

to predict failures; candidate properties include

memory usage, number of open files, processor

usage.

We will also deal with the bias introduced when

calculating survival time without considering the

duration of the last operation.

Finally, the proposed model could be particularly

useful dealing with autonomic systems. Autonomic

systems could be instructed to receive signals of

likely failures and upon reception of such signals

could start a suitable recovery procedure (Müller et

al., 2009).

ACKNOWLEDGEMENTS

We thank the Autonomous Province of South Tyrol

for supporting us in this research.

REFERENCES

Agerbo, E. 2007. High income, employment, postgraduate

education, and marriage : a suicidal cocktail among

psychiatric patients. Archives of General Psychiatry,

64, 12, 2007, 1377-1384.

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

204

Barros, C. P. and Machado, L. P. 2010. The length of stay

in tourism. Annals of Tourism Research, 37, 3, 2010,

692-706.

Benda, B. 2005. Gender differences in life-course theory

of recidivism: A survival analysis. International

Journal of Offender Therapy and Comparative

Criminology, 49, 3, 2005, 325-342.

Bøvelstad, H. M., Nygård, S., Størvold, H. L., Aldrin, M.,

Borgan, Ø., Frigessi, A., and Lingjærde, O. C. 2007.

Predicting survival from microarray data a

comparative study. Bioinformatics, 23, 16, 2080–

2087.

Bøvelstad, H. M. 2010. Survival prediction from high-

dimensional genomic data. Doctoral Thesis.

University of Oslo.

Chen, Y., Zhang, H., and Zhu, P. 2009. Study of Customer

Lifetime Value Model Based on Survival-Analysis

Methods. In Proceedings of the World Congress on

Computer Science and Information Engineering (Los

Angeles, USA, March 31 – April 02, 2009), 266-270.

Coman I. and Sillitti A. 2007. An Empirical Exploratory

Study on Inferring Developers' Activities from Low-

Level Data. In SEKE’07, International Conference on

Software Engineering and Knowledge Engineering.

Coman, I. D., Sillitti, A., and Succi, G. 2009. A case-study

on using an Automated In-process Software

Engineering Measurement and Analysis system in an

industrial environment. In ICSE’09, International

Conference on Software Engineering, pp. 89 – 99.

Cox, D. R. 1972. Regression models and life-tables.

Journal of the Royal Statistical Society Series B, 34,

1972, 187–220.

Hao, K., Luk, J. M., Lee, N. P. Y., Mao, M., Zhang, C.,

Ferguson, M. D., Lamb, J., Dai, H., Ng, I. O., Sham,

P. C., and Poon, R. T. P. 2009. Predicting prognosis in

hepatocellular carcinoma after curative surgery with

common clinicopathologic parameters. BMC Cancer,

9, 2009, 398-400.

Hosmer, D. W., Lemeshow, S., and May, S. 2008. Applied

survival analysis: Regression modeling of time to

event data. Wiley, 2nd edition.

Kalbfleisch, J. D. and Prentice, R. L. 2002. The statistical

analysis of failure time data. Wiley, 2nd edition.

Kleinbaum, D. G. and Klein, M. 2005. Survival analysis:

a self-learning test (Statistics for Biology and Health).

Springer, 2nd edition.

Lee E. T. and Wang, J. W. 2003. Statistical methods for

survival data analysis. Wiley, 3rd edition.

Li, Z., Zhou, S., Choubey, S., and Sievenpiper, C. 2007.

Failure event prediction using the Cox proportional

hazard model driven by frequent failure sequences.

IEE Transactions, 39, 3, 2007, 303-315.

Mannila, H., Toinoven, H., and Verkamo, A. I. 1997.

Discovery of frequent episodes in event sequences.

Data Mining and Knowledge Discovery, 1, 1997, 259

– 289.

Moser R., Sillitti A., Abrahamsson P., and Succi G. 2006.

Does refactoring improve reusability? In Proceedings

of the International Conference on Software Reuse,

287-297.

Müller, H. A., Kienle, H. M., Stege, U. 2009 Autonomic

Computing: Now You See It, Now You Don’t—

Design and Evolution of Autonomic Software

Systems. In: De Lucia, A., Ferrucci, F. (eds.):

Software Engineering International Summer School

Lectures: University of Salerno, LNCS 5413,

Springer-Verlag, 32–54.

Pandalai, D. N. and Holloway, L. E. 2000. Template

languages for fault monitoring of timed discrete event

processes. IEEE Transactions on Automatic Control,

45, 5, 2000, 868 – 882.

Sampath, M., Sengupta, R., and Lafortune, S. 1994.

Diagnosability of discrete event systems. In

Proceeding of the 11th international conference on

Analysis and Optimization of Systems Discrete Event

Systems (Sophia, Antipolis, June 15 – 17, 1994), 73 –

79.

Schmidt, P. and Witte, A. D. 1989. Predicting Criminal

Recidivism Using “Split Population” Survival Time

Models. Journal of Econometrics, 40, 1, 1989, 141-

159.

Scotto M., Sillitti A., Succi G., Vernazza T. 2004. A

Relational Approach to Software Metrics. In

Proceedings of the Symposium on Applied Computing,

pp. 1536-1540, 2004.

Scotto M., Sillitti A., Succi G., Vernazza T. 2006. A Non-

Invasive Approach to Product Metrics Collection.

Journal of Systems Architecture, 52, 11, pp. 668 –

675.

Sherkat, D. E. and Ellison, C.G. 2007. Structuring the

Religion- Environment Connection: Religious

Influences on Environmental Concern and Activism.

Journal for the Scientific Study of Religion, 46, 2007,

71-85.

Sillitti, A., Janes, A., Succi, G., and Vernazza, T. 2003.

Collecting, Integrating and Analyzing Software

Metrics and Personal Software Process Data. In

EUROMICRO, pp. 336 – 342.

Sillitti A., Janes A., Succi G., Vernazza T. 2004. Measures

for Mobile Users: an Architecture. Journal of Systems

Architecture, 50, 7, pp. 393 – 405.

Sillitti A., Succi G. 2005. Requirements Engineering for

Agile Methods. In Engineering and Managing

Software Requirements, Springer.

Srinivasan, V. S. and Jafari, M. A. 1993. Fault

detection/monitoring using time petri nets. IEEE

Transactions on System, Man and Cybernetics, 23, 4,

1993, 1155 – 1162.

Wendel, M., Jensen, U., and Göhner, P. 2008. Mining

software code repositories and bug databases using

survival analysis models. In Proceedings of the 2nd

ACM-IEEE international symposium on Empirical

software engineering and measurement

(Kaiserslautern, Germany, October 09 - 10, 2008),

282-284.

Yanaihara, N., Caplen, N., Bowman, E., Seike, M.,

Kumamoto, K., Yi, M., Stephens, R. M., Okamoto, A.,

Yokota, J., Tanaka, T., Calin, G. A., Liu, C. G., Croce,

C. M., and Harris C. C. 2006. Unique microRNA

FAILURE PREDICTION USING THE COX PROPORTIONAL HAZARD MODEL

205

molecular profiles in lung cancer diagnosis and

prognosis. Cancer Cell, 9, 3, 2006, 189-198.

Yu, S. L., Chen, H. Y., Chang G. C., Chen, C. Y., Chen,

H. W., Singh, S., Cheng, C. L., Yu, C. J., Lee, Y. C.,

Chen, H. S., Su, T. J., Chiang, C. C., Li, H. N., Hong,

Q. S., Su, H. Y., Chen, C. C., Chen, W. j., Liu, C. C.,

Chan, W. K., Chen, W. J., Li, K. C., Chen, J. J. W.,

and Yang, P. C. 2008. MicroRNA Signature Predicts

Survival and Relapse in Lung Cancer. Cancer Cell,

13, 1, 2008, 48-57.

Zheng, Z., Lan, Z., Park, B. H., and Geist, A. 2009.

System log pre-processing to improve failure

prediction. In Proceedings of the 39th Annual

IEEE/IFIP International Conference on Dependable

Systems & Networks (Lisbon, Portugal, June 29 – July

2), 572-577.

ICSOFT 2011 - 6th International Conference on Software and Data Technologies

206