ATTRIBUTE SELECTION BY MULTIOBJECTIVE EVOLUTIONARY

COMPUTATION APPLIED TO MORTALITY FROM INFECTION

IN SEVERE BURNS PATIENTS

A. Jara, R. Mart´ınez, D. Vigueras, G. S´anchez and F. Jim´enez

Department of Communications and Information Engineering, University of Murcia, Murcia, Spain

Keywords:

Attribute selection, Evolutionary computation, Multiobjective optimisation, Data mining.

Abstract:

The problem of selecting variables in data-mining can be modelled as an optimisation problem involving

multiple objectives which must be simultaneously optimised. This contribution proposes a multiple objec-

tive optimisation model for the problem of selecting variables applicable to the classiﬁcation of mortality in

patients from a hospital burns unit. The evolutionary multiobjective algorithm NSGA-II was adapted to re-

solve the proposed multiobjective optimisation model proposed and the results obtained were compared with

those obtained with a battery of algorithms intended for selecting variables included in the Weka data-mining

platform. The comparison underlines the efﬁcacy and suitability of the proposed model and of the use of

multiobjective evolutionary computation in this type of problem.

1 INTRODUCTION

The infection-related mortality rate in Intensive Care

Units (in Spanish, UCI) exceeds 15,000 patients per

year, while infection is the most common cause of re-

ality in burns units. A rapid classiﬁcation system is

therefore of great interest for determining the most

suitable treatment for ensuring recover. Studies car-

ried out by the Spanish Society of Intensive, Criti-

cal and Coronary Medicine Units (in Spanish, SEMY-

CYUC) at national level (Society, 2007) and the Cor-

nell Hospital Medical Center of New York in a study

of 937 patients in their burns unit (Curreri et al., 1980)

underline this interest.

Consequently, the ultimate aim was to implement

and evaluate an easy to use system for classifying pa-

tients in a burns unit, to obtain the highest possible

percentage of correct decisions (Jim´enez et al., 2009)

and to reduce the number of variables, thus making

the system easier to understand by staff and restrict-

ing the number of clinical tests necessary to perform.

This article will focus on the second of these objec-

tives, using a multiobjective optimisation model to se-

lect the variables.

Evolutionary computation (Goldberg, 1989) has

been successfully applied to optimise multiobjective

problems (Deb, 2001),(Coello et al., 2002) and, par-

ticularly, to generate models for classifying patients

such as those suffering leukaemia (Kumar et al., 2007)

and for selecting variables (Pappa et al., 2002).

In this article is proposedan evolutionaryfocus for

muliobjective optimisation to select variables in the

context of mortality through infection among burns

patients. A set of solutions denominated Pareto solu-

tions are obtained in order to enable the user to choose

the non-dominated solution that best suits, depending

on the decision environment. The well knownNSGA-

II algorithm (Deb et al., 2002) is used to solve the

optimisation problem for attribute selection. This al-

gorithm is elitist and based on Pareto’s concept, in

which all the objectives are optimised simultaneously

in the search for non-dominated solutions using, ad-

ditionally, an explicit diversity mechanism.

To assess whether muliobjective evolutionary

computation is suitable for the nature of the problem

in hand, the results obtained with the multiobjective

optimisation model proposed and solved by NSGA-

II are compared with the results obtained using six-

teen different techniques for attribute selection incor-

porated in the Weka platform of data mining (Hall

et al., 2009).

The paper is structured as follows. Section 2 pro-

poses a multiobjective optimisation model for select-

ing the attributes. Section 3 describes the basic com-

ponents of the evolutionary NSGA-II multiobjective

optimisation algorithm proposed. Section 4 describes

467

Jara A., Martínez R., Vigueras D., Sánchez G. and Jiménez F..

ATTRIBUTE SELECTION BY MULTIOBJECTIVE EVOLUTIONARY COMPUTATION APPLIED TO MORTALITY FROM INFECTION IN SEVERE

BURNS PATIENTS .

DOI: 10.5220/0003126904670471

In Proceedings of the International Conference on Health Informatics (HEALTHINF-2011), pages 467-471

ISBN: 978-989-8425-34-8

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

the experiments carried out to validate the model, the

results and a comparison with other techniques for at-

tribute selection. Finally, Section 5 presents the con-

clusions and suggests possible future work.

2 MULTIOBJECTIVE

OPTIMISATION MODEL FOR

ATTRIBUTE SELECTION

The selection of variables concerns ﬁnding the small-

est subset of variables in a data base to obtain the most

accurate classiﬁcation possible (Pappa et al., 2002).

Described more formally, with X being the number

of variables in an initial set T, the algorithm ﬁnds a

subset P of Y variables from the set T, where Y ≤ X,

with the aim of removing the irrelevant or redundant

variables, and obtaining good accuracy in the classiﬁ-

cation (Aguilera et al., 2007). Therefore, the problem

of attribute selection can be approached as a multiob-

jective optimisation problem (Deb, 2001), the solu-

tion of which comprise as set of solutions called non-

dominated solutions (or Pareto solutions). Solution x

dominates another solution y if (Deb, 2001):

• Solution x is not worse than y for any of the pur-

poses in mind;

• Solution x is strictly better than y for at least one

of the objectives.

For the variables selection problem in mind, two

optimisation criteria have been considered: accuracy

and compactness. To formulate these criteria the fol-

lowing quantitative measures have been deﬁned .

Given a solution x = { x

| x

∈ T}:

• Accuracy. Based on the classiﬁcation ratio

CR(x) =

Φ(x)

, where Φ(x) is the number of data

correctly classiﬁed for a set of variables, x, by a

given classiﬁcation algorithm, and N are the total

number of data.

• Compactness. By cardinality the card(x) of the

set x is established, that is, the number of variables

used to construct the model.

In this way, the optimisation model proposed with

the criteria deﬁned is the following:

Maximize CR(x)

Minimize card(x)

(1)

The objectives were to increase the accuracy of

the model and to reduce the number of variable to the

greatest extent possible. In some cases, such as will

be presented, it was interesting to sacriﬁce accuracy

slightly, when the number of variables were reduced

signiﬁcantly, in order to, simplify the model . Such

as can be appreciated, the objectives in the optimiza-

tion model 1 are contradictory since a lower number

of signiﬁcant variables means a lower classiﬁcation

rate and vice versa, that is the greater the number of

variables the greater the classiﬁcation rate. The so-

lution to model 1 is a set of m ≤ X non-dominated

solutions C = {x

, k ∈ S}, S = {1, . . . , X}, where each

solution x

of C represents the best collection of sig-

niﬁcant k variables. For example, for X = 5 (5 vari-

ables to be selected), a set of non-dominatedsolutions

C = {x

, x

} means that the Pareto front is composed

of non-dominated solutions of 3 and 5 variables, re-

spectively. The solutions with 1, 2 and 4 signiﬁcant

variables are not on the Pareto front and will there-

fore be dominated.

3 MULTIOBJECTIVE

EVOLUTIONARY

COMPUTATION FOR

ATTRIBUTE SELECTION

Three elements can be distinguished in a variables se-

lection algorithm (Aguilera et al., 2007).

• A search algorithm, which explores the space of

the variables available.

• An evaluation function, which provides a measure

of the ﬁtness of the variables chosen. According

to how this function is designed, the selection al-

gorithms can be classiﬁed as ﬁlter models or em-

bedded models. The former use measures that

take into account the separation of classes based

on information distance metrics, dependency met-

rics, etc., wile the latter use an estimate of the ac-

curacy attained by a classiﬁcation algorithm using

selected variables.

• A ﬁtness function that validates the subset of vari-

ables, which are ﬁnally chosen.

Evolutionary Computation has been used both for

ﬁlter and embedded models. The work described here

falls into the latter category since the accuracy and

the simplicity of the cklassiﬁcation obtained id one

of the fundamental objectives. The NSGA-II (Deb

et al., 2002) algorithm, the principal components of

which are brieﬂy described below, is used to resolve

the problem described in 1.

Representation of Solutions. A binary codiﬁcation

of ﬁxed length equal to the number of variables in the

problem is used. In this way, a gene of value 1 in the

HEALTHINF 2011 - International Conference on Health Informatics

468

locus i of the chromosome means that the variable x

has been selected, while 0 means that variable x

has

not been selected.

Initial Population. The initial population is gener-

ated randomly using a uniform distribution in the do-

main.

Suitability Function. The NSGA-II algorithm

minimises the following two evaluation functions.

(x) = −CR(x)

(x) = Φ(x)

where CR(x) is the classiﬁcation rate obtained us-

ing the algorithm C4.5 (Kotsiantis, 2007), and Φ(x)

is the number of genes with a value 1 of the chromo-

some x. Speciﬁcally, classiﬁcation rate CR(x) is ob-

tained from the classiﬁcation ratio carried out for 25%

of the cases of the knowledge base in a decision tree

generated with the algorithm C4.5 constructed with

75% of the cases, that is, the rest of the knowledge

base.

Genetic Operators. It is based on the uniformcross

and the uniform mutation operators.

4 EXPERIMENTS AND RESULTS

This work is based on data from the Hospital Infor-

mation System of an Intensive Care Unit (ICU) since

1999 to 2002, using the data (assessed by hospital

staff) of 99 patients with different complications. The

specialists consulted consider that the parameters de-

scribed in Table 1, from the Electronic Health Record,

are relevant for establishing the survival of patients

with infections in an ICU.

For this study we selected the records of 99 pa-

tients to form part of the knowledge base (in the for-

mat used for the Weka platform in Table 1). As can be

seen, the problem consists of 5 real entries, 12 discrete

Boolean type entries and a Boolean output (prognosis

of death). The NSGA-II algorithm was run 100 times

with the parameters shown in Table 2

To evaluate the results is used the hypervolume

(Deb, 2001) metric, which calculates the fraction of

the space that is not dominated by any of the solu-

tions obtained by the algorithm. This metric therefore

estimates the distance of the solutions from the real

Pareto front as the diversity of the same. So that it

gives an objective measurement that permits the re-

sults obtained to be compared with those obtained by

other algorithms. Table 3 shows the best, worst and

mean values obtained with the 100 runs.

Table 1: Parameters of patients considered: Type T may be

real (R) or Boolean (B).

Name Description T

Total Area burnt % R

Deep Area of deep burns R

SAPS II General indicator of seriousness R

Weight Patient’s weight R

Age Patient’s age R

Pneumonia Pulmonary infection B

Sex Patient’s sex B

Inh Use of inhibitors B

Infect-Wound Surgical infection B

AIDS-Drugs Drug consumption and HIV B

Hepa-Co Previous hepatic problems B

Bacteremia Presence of bacteria in blood B

Cardiac-Co Previous cardiopathies B

Resp-Co Respiratory problems B

HBP High blood pressure B

Diabetes Diabetic patient B

Renal-Co Renal problems B

Death Prognosis of death B

Table 2: Parameters of NSGA-II algorithm.

Parameters of NSGA-II algorithm

Size of population 200

Number of populations 500

Probability of uniform crossover P

0.3

Probability of uniform mutation P

0.01

Table 3: Best, mean and worst hypervolume values.

Best Mean Worst

Hypervolume (ν) 0.986 0.911 0.869

Results were also contrasted with other algorithms

for attribute selection, available on the Weka data

mining platform. Such as presented in Table 4, the

best solution contains non-dominated solutions of 1

and 6 variables, the classiﬁcation rate with 6 variables

being 100%.

Table 4: Set of non-dominated solutions obtained.

Variables M % Correct

Pulmonary infection 1 95.4 %

Burnt area, Weight,

Pulmonary infection , HBP,

Diabetes, Renal Problems 6 100 %

The attributes were ﬁrst ﬁltered with the method

indicated in Table 5 to carry out the evaluation with

Weka. The attributes selected by Weka were used to

build a model with the algorithm C4.5, at the same

way that has been carried out for the attributes se-

lected with the NSGA-II solution. Thereby, the same

ATTRIBUTE SELECTION BY MULTIOBJECTIVE EVOLUTIONARY COMPUTATION APPLIED TO MORTALITY

FROM INFECTION IN SEVERE BURNS PATIENTS

469

conditions are deﬁned during all the evaluation. The

evaluation has been made with 75% of the cases for

training, and the remaining 25% of cases for test.

Table 5: Experiments using the algorithms from Weka.

Evaluator/method %Hit N Var

CfsSubsetEval

BestFirst 74.62 % 5

ExhaustiveSearch 74.62 % 5

GeneticSearch 74.62% 5

GreedyStepwise 74.62 % 4

ClassiﬁerSubsetEval.J48

BestFirst 80.59 % 9

ExhaustiveSearch 76.11 % 7

GeneticSearch 71.64 % 5

GreedyStepwise 74.62 % 5

ConsistencySubsetEval

BestFirst 74.62 % 10

ExhaustiveSearch 74.62 % 9

GeneticSearch 65.67 % 10

GreedyStepwise 74.62 % 3

FilteredSubsetEval

BestFirst 74.62 % 4

ExhaustiveSearch 74.62 % 4

GeneticSearch 74.62 % 4

GreedyStepwise 74.62 % 2

Such as presented in Table 5, the best result ob-

tained was 80.595 with 9 variables using the selector

“BestFirst”, the solutions obtained with our proposed

method showed a better correctness rate and lower

number of variables.

5 CONCLUSIONS AND FUTURE

WORK

In this paper has been proposed an optimization mul-

tiobjetive model for attribute selection, particularly

those related with mortality in burns unit patients suf-

fering infections. The attributes selected are used to

build a model, which classify the patients. This opti-

mization multiobjetive model for attribute selection is

based on NSGA-II algorithm, which has been adapted

to the particularities of the problem. The results ob-

tained present a clear improvement with respect to

the 16 algorithms included in the Weka data mining

platform. The results point to a 100% correct clas-

siﬁcation using only 6 of the 17 variables contained

in the knowledge base, while the best case obtained

with the algorithms included in the Weka platform

was 80.59% using 9 variables. In other words we ob-

tain a better classiﬁcation rate with fewer variables.

Future work will be focused on compare the re-

sults obtained with the NSGA-II algorithm with other

multiobjetive evolutionary models PAES, SPEA and

ENORA. Use other standard data bases reported in

the literature to validate the proposal, and estab-

lish a decision-making system for selecting the non-

dominated solution, which best satisﬁes the decision

maker requirements.

ACKNOWLEDGEMENTS

This work has been carried out in frames of: Pro-

grama de Ayuda a los Grupos de Excelencia de la

Fundacin Sneca 04552/GERM/06, and the project

MEC/FEDER TIN2009-14372-C03-01. Finally, the

authors would like to thank Dr. Francisco Palacios

Ortega for his collaboration.

REFERENCES

Aguilera, J., Chica, M., del Jesus, M., and Herrera,

F. (2007). Un estudio sobre el uso de algorit-

mos gen´eticos multimodales para selecci´on de car-

acter´ısticas. In Congreso Espa˜nol sobre Meta-

heur´ısticas, Algoritmos Evolutivos y Bioinspirados

(MAEB07), pages 485–492.

Coello, C., Veldhuizen, D., and Lamont, G. (2002). Evolu-

tionary Algorithms for Solving Multi-Objective Prob-

lems. Kluwer Academic/Plenum publishers.

Curreri, P. W., Luterman, Braun, A. J., and Shires, G. T.

(1980). Burn injury. analysis of survival and hospital-

ization time for 937 patients. Ann Surg, 192(4):472–

478.

Deb, K. (2001). Multi-Objective Optimization using Evolu-

tionary Algorithms. John Wiley & Sons.

Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002).

A fast elitist multiobjective genetic algorithm: Ns-

gaii. IEEE Transactions on Evolutionary Computa-

tion, 6(2):182–197.

Goldberg, D. (1989). Genetic Algorithms in Search, Opti-

mization, and Machine Learning. Addison-Wesley.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,

P., and Witten, I. (2009). The weka data mining soft-

ware: An update. SIGKDD Explorations, 11.

Jim´enez, F., S´anchez, G., Ju´arez, J., Alcaraz, J., and

S´anchez, J. (2009). Fuzzy classiﬁcation of mortality

by infection of severe burnt patients using multiobjec-

tive evolutionary algorithms. In Proceedings IWINAC

in Lecture Notes Series, pages 447–456.

Kotsiantis, S. (2007). Supervised machine learning: A

review of classiﬁcation techniques. Inform´atica,

31:249–268.

HEALTHINF 2011 - International Conference on Health Informatics

470

Kumar, K., Sharath, S., D’Souza, G., and Sekaran, K.

(2007). Memetic nsga. a multi-objective genetic al-

gorithm for classiﬁcation of microarray data. In Pro-

ceedings of the 15th international Conference on Ad-

vanced Computing and Communications ADCOM.

IEEE Computer Society, pages 75–80.

Pappa, G., Freitas, A., and Kaestner, C. (2002). A multi-

objective genetic algorithm for attribute selection. In

Proceedings of Fourth International Conference on

Recent Advances in Soft Computing (RASC), pages

116–121.

Society, S. (2007). Intensive-critical medicine and coronary

units and spanish society of emergency.generalized in-

fection mortality could be 20 percentage off (in span-

ish).

ATTRIBUTE SELECTION BY MULTIOBJECTIVE EVOLUTIONARY COMPUTATION APPLIED TO MORTALITY

FROM INFECTION IN SEVERE BURNS PATIENTS

471