
D - which uses only the four adverse outcomes.
Since the SOFA score takes costs and time to obtain,
in this study, a special attention will be given to the
last two settings.
In the initial experiments, it was considered more
important to approach feature selection than model
selection. Due to time constrains, the number of hid-
den nodes was set to round(N/2), where N denotes
the number of input nodes (N = 5, N = 21, N = 16
and N = 4, for the A, B, C and D setups, respec-
tively); and round(x) gives nearest integer to the x
value.
The commonly used 2/3 and 1/3 partitions were
adopted for the training and test sets (Flexer, 1996),
while the maximum number of training epochs was
set to E = 100. Each input configuration was tested
for all organ systems, being the accuracy measures
given in terms of the mean of thirty runs (Table 5).
The A selection manages to achieve a high per-
formance, with an Accuracy ranging from 86% to
97%, even surpassing the B configuration. This is
not surprising, since it is a well established fact that
the SOF A is a adequate score for organ dysfunction.
Therefore, the results suggest that there is a high cor-
relation between SOF A
d−1
and SOF A
d
.
When the SOF A index is omitted (C and D), the
Accuracy values only decay slightly. However, this
measure (which is popular within Data Mining com-
munity) is not sufficient in Medicine. Ideally, a test
should report both high Sensitivity and Specificity val-
ues, which suggest a high level of confidence (Essex-
Sorlie, 1995). In fact, there seems to be a trade-
off between these two characteristics, since when the
SOF A values are not present (Table 5), the Sensi-
tivity values suffer a huge loss, while the Specificity
values increase.
3.2 Balanced Training
Why do the A/B selections lead to high Accuracy
/Specificity values and low Sensitivity ones? The an-
swer may be due to the biased nature of the organ
dysfunction distributions; i.e., there is a much higher
number of false (0) than true (1) conditions (Figure
3).
One solution to solve this handicap, is to balance
the training data; i.e., to use an equal number of true
and false learning examples. Therefore, another set
of experiments was devised (Table 6), using random
sampling training sets, which contained 2/3 of the
true examples, plus an equal number of false exam-
ples. The test set was composed of the other 1/3 posi-
tive entries. In order to achieve a fair comparison with
the previous results, the negative test examples were
randomly selected from the remaining ones, with a
distribution identical to the one found in the original
dataset (as given by Figure 3).
The obtained results show a clear improvement in
the Sensitivity values, specially for the C configu-
ration, stressing the importance of the case mix at-
tributes. Yet, the overall results are still far from the
ones given by the A selection.
3.3 Improving Learning
Until now, the main focus was over selecting the cor-
rect training data. Since the obtained results are still
not satisfactory, the attention will move towards bet-
ter Neural Network modeling. This will be achieved
by changing two parameters: the number of hidden
nodes and the maximum number of training epochs.
Due to computational power restrictions, these factors
were kept fixed in the previous experiments. How-
ever, the adoption of balanced training leads to a con-
siderable reduction of the number of training cases,
thus reducing the required training time.
Several experimental trials were conducted, using
different combinations of hidden nodes (H = 4, 8,
16 and 32) and maximum number of epochs (E =
100, 500 and 1000), being selected the configuration
which gave the lowest training errors (H = 16 and
E = 1000). These setup lead to better results, for all
organ systems and accuracy measures (Table 6).
To evaluate the obtained results, a comparison
with other Machine Learning classifiers was per-
formed (Table 7), using two classical methods from
the WEKA Machine Learning software (Witten and
Frank, 2000): Naive Bayes - a statistical algorithm
based on probability estimation; and JRIP - a learner
based on ”IF-THEN” rules.
Although presenting a better Accuracy, the Naive
Bayes tends to emphasize the Specificity values, giv-
ing poor Sensitivity results. A better behavior is
given by the JRIP method, with similar Sensitivity and
Specificity values. Nevertheless, the Neural Networks
still exhibit the best overall performances.
4 CONCLUSIONS
The surge of novel bio-inspired tools, such as Neural
Networks, has created new exciting possibilities for
the field of Clinical Data Mining. In this work, these
techniques were applied for organ failure diagnosis
of ICU patients.
Preliminary experiments were drawn to test several
feature selection configurations, being the best results
obtained by the solely use of the SOFA value, mea-
sured in the previous day. However, this score takes
much more time and costs to be obtained, when com-
pared with the physiologic adverse events. Therefore,
another set of experiments were conducted, in order
MULTIPLE ORGAN FAILURE DIAGNOSIS USING ADVERSE EVENTS AND NEURAL NETWORKS
405