Interval Coded Scoring Index with Interaction Effects

A Sensitivity Study

Lieven Billiet

1,2

, Sabine Van Huffel

1,2

and Vanya Van Belle

1,2

STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Department of Electrical Engineering,

KU Leuven, Leuven, Belgium

iMinds Medical Information Technologies, Leuven, Belgium

Keywords:

Sparse Optimization, Interpretability, Scoring Systems.

Abstract:

Scoring systems have been used since long in medical practice, but often they are based on experience rather

than a structural approach. In literature, the interval coded scoring index (ICS) has been introduced as an

alternative. It derives a scoring system from data using optimization techniques. This work discusses an

extension, ICS*, that takes variable interactions into account. Furthermore, a study is performed to give

insight into the new model’s sensitivity to noise, the size of the data set and the number of non-informative

variables. The study shows interactions can mostly be discovered robustly, even in the presence of noise and

spurious variables. A ﬁnal validation on two UCI data sets further indicates the quality of the approach.

1 INTRODUCTION

When working in the medical ﬁeld, one notices

that applying standard Machine Learning approaches

faces difﬁcult challenges. Generic techniques such as

Support Vector Machines (SVM) and Bayesian clas-

siﬁers have been used (Chowriappa et al., 2014), but

they most often offer a black-box solution of a prob-

lem. In order to accept the support of a system, a

medical expert should understand and trust its recom-

mendations. Therefore, interpretability is important.

Looking back at medical practice since the early days,

one can see that one kind of interpretable models has

frequently been used in the medical world itself: scor-

ing systems. Examples include APACHE-II, SIRS,

Glasgow (pancreatitis) (Mounzer et al., 2012), PSI

and CURB-65 (pneumonia) (Jeong et al., 2013). They

are powerful methods, often based on clinical experi-

ence or mathematical models, but their discriminative

power is limited due to their simplicity. Furthermore,

most systems developed so far are not the result of a

standardized or well-founded learning approach. Yet,

studies to validate or compare commonly used scores

have been conducted (Mounzer et al., 2012; Jeong

et al., 2013). There have also been attempts to con-

struct scoring systems with statistical methods (Yang

et al., 2011) or directly from data (Van Belle et al.,

2012), but the proposed models are restricted to the

main effects or involve tuning ad hoc parameters.

Generating a scoring system from data involves

ﬁnding a sparse model. This approach is well-known

in ﬁelds such as compressed sensing, where ℓ

or ℓ

minimization is used to induce this property. Some

example methods include the LASSO or basic pur-

suit (Davenport et al., 2012). Similar approaches have

been used to generate scoring systems (Ustun et al.,

2013) recently, but focus on giving integer coefﬁ-

cients to previously deﬁned features in general. Yet,

other approaches, such as the Interval Coded Scoring

Index (ICS) (Van Belle et al., 2012), focus rather on

intervals, but are limited to main effects.

The remainder of this paper is structured as fol-

lows: the next section introduces the extension of ICS

that allows for interaction effects: ICS*. Section 3

discusses the sensitivity study carried out on synthetic

data. Finally, the framework is applied on two UCI

data sets after which we conclude with a discussion

and a preview to future work.

2 THE ICS* ALGORITHM

The model can be best explained by expressing it in

an SVM framework as a binary classiﬁcation prob-

lem. The primal formulation of a (non-linear) SVM

is given by (Vapnik, 1995):

min

w,b,ε

w+ γ

∑

i=1

(1)

Billiet, L., Huffel, S. and Belle, V.

Interval Coded Scoring Index with Interaction Effects - A Sensitivity Study.

DOI: 10.5220/0005646500330040

In Proceedings of the 5th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2016), pages 33-40

ISBN: 978-989-758-173-1

subject to:



ϕ(x

) + b) ≥ 1 − ε

, ∀i = 1..N

≥ 0, ∀i = 1..N

In this formulation, (x

) are pairs of observations

and labels, w and b the coefﬁcients and bias of the

model, respectively, and ε

slack variables used for

the regularization controlled by γ.

The original ICS approach (Van Belle et al., 2012)

restricts the feature map ϕ(x). Instead of the original

input vectors x

with variables x

, it considers binary

variables z

i,l

indicating whether the x

are within pre-

deﬁned intervals

−1

,τ

, l

= 1 : k

+1, τ

= −∞,

= ∞. Furthermore, the total variation of the co-

efﬁcient vector is minimized instead of its norm. As

a result, a sparse scoring system is automatically ob-

tained. To further improve sparsity, the coefﬁcients

are iteratively reweighted. To allow the inclusion of

interaction, ICS* further expands the binary feature

space as follows.

Mapping to a Binary Feature Space. Assume an

observation x

∈ R

. The proposed feature map is

→ R

) : z

= ϕ(x

) =



)...ϕ

)



in which gr ⊂ {1, ..., d} can be any subset of the orig-

inal variables. Hence, z

is the concatenation of fea-

ture maps for every variable and the groups of vari-

ables among which interactions should be considered.

The feature submaps ϕ

expand the space spanned by

the variables involved to a multidimensional binary

space. The submap for a group gr involving variables

, p

..p

} contains the following binary features:

..p

i,l

..l

= I(τ

−1

≤ x

< τ

) (2)

& ...

& I(τ

−1

≤ x

< τ

)

with l

∈ {1,..,k

+ 1}, ...

∈ {1,..,k

+ 1}

is a multidimensional array indexed by l

,..,l

I is a binary indicator using thresholds τ to split the

range of each variable. In effect, the space spanned

by the original variables is divided into bins based on

the thresholds τ. These are initially inferred from the

distribution of the data, but thanks to the minimiza-

tion of the variation in w, bins will be merged if pos-

sible during the ICS* procedure. Finally, the multidi-

mensional array can be vectorized to yield the feature

vector ϕ

). These feature vectors are then con-

catenated to yield the full feature vector z

The resulting optimization problem can be ex-

pressed in matrix formulation as:

min

w,b,ε

kDwk

+ γε

1, D ∈ R

d f

,w ∈ R

(3)

s.t.:



Y(Zw+ b) ≥ 1− ε, Y ∈ R

NxN

,Z ∈ R

NxN

ε ≥ 0, ε ∈ R

w is a vector containing the coefﬁcients that will con-

tribute to the score when the corresponding binary

feature in z

equals 1. D is a matrix deﬁning coef-

ﬁcient differences, Z is the data matrix made up of

rows z

in the binary feature space and Y is a diagonal

matrix of class labels. N, N

and N

d f

are the number

of data observations, binary features and coefﬁcient

differences, respectively.

The matrix D is necessary to minimize the total

variation of the coefﬁcient vector w. Multiplication

of D with w yields differences between adjacent bins

in the multidimensional representation f

deﬁned in

Equation (2). For example, for the bin f

..p

i,l

..l

, the

matrix D deﬁnes q coefﬁcient differences:

..p

i,l

..l

− w

..p

i,l

−1..l

,..,w

..p

i,l

..l

− w

..p

i,l

..l

−1

To make sure that the ﬁrst coefﬁcient of each group

equals zero, an additional row with only a single 1

corresponding to the ﬁrst binary feature of that group

is included in D.

Despite sparsity, one can still end up with small

steps w

..p

i,l

..l

− w

..p

i,l

−1..l

. From the point of in-

terpretability, less and larger steps are preferred. For

this reason, one tries to strike a balance between ac-

curacy, induced by small steps corresponding to lo-

cal behavior, and interpretability, which beneﬁts from

less steps. This trade-off can be achieved by iterative

reweighting of the model.

Scoring and Prediction of Probability. To convert

the model to a scoring system, the coefﬁcients w are

normalized and rounded to obtain integer point values

s. The score can then be obtained by summing across

the binary feature space: score = s

z. Finally, a map-

ping from scores to probabilities is obtained through

application of logistic regression with the scores as

only predictor.

Some Remarks on Solving the ICS* Formulation.

Although the formulation in Equation (3) contains

an absolute value, it can be reformulated as a linear

programming problem and solved by dedicated

solvers. Yet, one should be aware that the size of the

system grows with the number of variables in the

data set, the number of thresholds τ of each variable

and, particularly, the number of required interactions.

Solving it is possible because of its inherent sparsity

both in the data and in the constraints. This property

not only allows storing the system, but it can also be

exploited by dedicated solvers.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

3 SENSITIVITY STUDY

The sensitivity study performed in this paper is car-

ried out on synthetic data. This allows to know the

correct solution and to insert speciﬁc effects. The

model can be expressed as

p = S(7x

+ 4x

+ 3x

− 3) (4)

in which S is the standard logistic (sigmoid) function

and p the risk or probability of the data point x belong-

ing to the target class. The model includes two main

effects, quadratic in x

and linear in x

, and one inter-

action involving x

and x

. Data generation is done by

randomly generating a pool of independent normally

distributed data. Apart from the four required, addi-

tional non-informative variables can be added. The

basic data set that will be used for the study consists

of 250 observations for each class, involving seven

variables (four required and three additional). This

set will be used in the remainder, unless mentioned

otherwise.

The results of applying ICS* on the basic data set

are presented in Figure 1. Two third of the data was

used for training, one third for testing. The three top

parts of the Figure represent the detected effects. The

τ values are shown at the borders. The top effect in-

volves x

and x

. Notice the inﬂuence of the multipli-

cation. The quadratic and linear effect were correctly

detected as well, whereas the three spurious variables

were correctly rejected. The bottom part of the Figure

is the Risk Proﬁle. It maps the ﬁnal score obtained by

summing over all effects to the probability of belong-

ing to the target class. With this model, ICS* is able

to classify the test data with an accuracy of 86.5%, or

an Area Under the ROC Curve (AUC) of 0.94.

The sensitivity study consists of ﬁve parts. The

ﬁrst part is a simple resampling by cross-validation

(CV) of the model data, with and without interac-

tions to investigate the stability of the feature selec-

tion. Secondly, the inﬂuence of additive white noise

will be investigated. Furthermore, the inﬂuence of the

number of non-informative variables and the training

set size are discussed. These last two in particular

have an effect on the execution time, the last topic of

the study. Unless stated otherwise, the model will be

trained on two third of the data whereas one third will

be used for testing.

Resampling. Resampling is performed to assess the

basic stability of the model. If slightly different data

is used, to what extent do the detected effects change?

The resampling is performed in the structured cross-

validation framework (10 folds) in which each part

of the data is used for testing in one fold, whereas it

-1

-5

-1

-4

-5

-1

-2

-3

-4

-3

-4

-3

-2

-1

-6

-4

-1

-6

-3

-1

-6

-3

-1

-3 -2 -1 0 1 2 4

-3

-2

-1

vs x

0 -2 -4 -2 1

-3 -2 -1 1 2

0 1 2 3

-3 -1 0 1

0.03

0.21

0.68

0.94

0.99

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

Risk Proﬁle

Score

Risk

Figure 1: The results of the application of ICS* to the basic

synthetic data set.

is used for training in all other folds. The same 10-

fold cross-validation was carried out twice: once with

the additional restriction that no interactions should

be investigated (basic ICS), and once including the

option for interactions (ICS*).

The discovered effects for the ten folds and corre-

sponding test AUCs are shown in Table 1. Both ICS

and ICS* mostly succeed at keeping the relevant ef-

fects included. Of course, the interaction effect can-

not be discovered by ICS since these effects are not

considered in this method. Secondly, ICS has less ten-

dency to include non-informative effects. This was

expected, since the number of effects to be consid-

ered and the average number of coefﬁcients per effect

Table 1: Resampling results for ICS|ICS*. For each fold,

the detected effects and test AUC (%) are indicated.

Fold

Effect

[1,2]

Effect

#Other

Effects

Test

AUC(%)

1 |X X|X X|X 0|1 78|93

2 |X X|X X|X 0|1 72|90

3 |X X|X X|X 0|0 60|90

4 | X|X X|X 0|0 72|72

5 |X X|X X|X 2|21 65|93

6 |X X|X X|X 0|13 77|82

7 |X X|X X|X 0|2 72|88

8 |X X|X X|X 5|3 71|83

9 |X X|X X|X 0|3 62|83

10 |X X|X X|X 0|3 65|90

Interval Coded Scoring Index with Interaction Effects - A Sensitivity Study

are much larger for ICS*. For both settings, some

overﬁtting occurs (fold 8 for ICS, folds 5 and 6 for

ICS*). Unexpected selected effects in ICS* often in-

clude main effects of variables 1 and 2. Taking into

account the interaction between these variables, these

effects can actually be informative. In other words,

the main effects could be incorporated into the inter-

action effect. The same holds for some other effects,

e.g. an interaction between variable 1 and variable 4

is considered relevant a few times. This still yields a

good model. This is the result of the non-uniqueness

of the model structure and will be further discussed in

Section 5. Table 1 shows ICS* obtains a better classi-

ﬁer than ICS. Even when overﬁtting occurs, the vari-

able coefﬁcients are such that the model still yields

good performance. Of course, this comes at the cost

of a more complex model.

In conclusion, one could say that resampling can

be used to improverobustness of model selection. For

the ﬁnal model, trained on all the (training) data, only

the effects that occurred in more than 7 out of 10 folds

will be included. Applying this principle for the data

presented in Table 1 leads to inclusion of all correct

effects. No spurious detected effects are included in

ICS and one main effect for variable 2 is included for

ICS*. The analyses that follow will only be reported

for ICS*. The resampling scheme presented here will

be used in the remainder of the experiments.

Inﬂuence of Noise. The amount of additive white

noise can be characterized by the Signal-to-Noise Ra-

tio (SNR), deﬁned as SNR =

, the ratio of the

variances of the signal and the noise. The inﬂu-

ence of noise can be shown by comparing the mod-

els found for various SNRs. In this study, SNR

∈ {∞,50, 25, 10, 5, 4, 3, 2,1.5} will be considered. A

Signal-to-Noise Ratio of ∞ corresponds to the noise-

less case. Noise is added to x

after setting y

using

the model described in Equation (4).

The sensitivity to noise is illustrated in Table 2.

For high SNR, all relevant effects are detected,

whereas for lower SNR, the interaction effect is lost.

This is logical, since both variables are affected by

the noise. Although the interaction effect is masked

by the noise, ICS* does not model the noise itself.

The additional spurious detections for high SNR may

seem surprising, but they involve variables 1 and 2,

the variables also involved in the interaction. As such,

they do contribute to the solution. The last column of

the Table highlights the drop in performance when the

noise level increases.

Inﬂuence of Non-Informative Variables. ICS* is

able to exclude non-informative variables from the

Table 2: Inﬂuence of noise on the detected effects and test

AUC for ICS*.

SNR Eff [1,2] Eff 3 Eff 4 #Other Eff AUC

Inf X X X 2 0.91

50 X X X 1 0.86

25 X X X 2 0.86

10 X X 0.80

5 X X X 0.79

4 X X X 0.76

3 X X 0.63

2 X 0.63

1.5 X X 0.58

model. However, a variable can only be excluded if all

of its bins in the extended binary feature space have

zero coefﬁcients. To quantify the inﬂuence of having

a higher number of variables, ICS* was applied for an

additional number of non-informativevariables going

from one to ten. The experiments yielded a correct re-

jection of all non-informativevariables whilst keeping

the test AUC around 0.9.

Inﬂuence of Training Set Size. One would expect

an improvement in the ability of ICS* to infer a model

from the data when the training set size grows. To

study this, an independent test set of 150 observations

of each class is considered. The training set is en-

larged gradually. Set sizes of 100, 200, 500, 1000,

1500, 2000, 3000 and 5000 with equal contribution

of the two classes will be considered.

The inﬂuence of the training set size is presented

in Table 3. Even when only 100 data points are avail-

able, the three effects can be discovered. The one ad-

ditional effect for a set size of 500 is related to vari-

able 1, which is indeed involved in the model. When

one looks in more depth at the generated models for

each case, one observes that when the set size grows,

the number of binary features for some effects in-

creases, particularly for the interaction. This signiﬁes

that although the correct effects are already discov-

ered with less data, the scoring system becomes more

reﬁned when more data is added. This is due to the

choice of τ (when more data are available, more and

Table 3: Inﬂuence of the training set size on the detected

effects and the test AUC for ICS*.

Size Eff [1,2] Eff 3 Eff 4 #Other Eff AUC

100 X X X 0.85

200 X X X 0.87

500 X X X 1 0.95

1k X X X 0.94

1.5k X X X 0.94

2k X X X 0.94

3k X X X 0.94

5k X X X 0.94

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

1 2 3 4 5 6 7 8 9 10

100

120

non-weighted

reweighted

100 200 500 1000 1500 2000 3000 5000

non-weighted

reweighted

Execution time (s)

Number of non-informative variables

Training set size

Figure 2: Execution time as a function of the number of non-informative variables (left) and training set size (right).

smaller intervals are considered). The impact on per-

formance is shown in the last column of the Table:

the AUC improves with a growing data set, though

even the coarsest model already has an AUC of 0.85.

With the data set used for this study, one notices AUC

saturation for a set size larger than 500 data points.

No information is gained by having a larger data set.

Note that these results depend on the complexity of

the underlying model and the predeﬁned thresholds τ.

Execution Time. The set size and the number of

non-informative variables both inﬂuence not only the

performance of the model in terms of accuracy, but

also the problem size. Depending on the method used

to solve Equation (3), it can have an impact on exe-

cution time. To quantify this, 100 executions of the

optimization problem (3) were performed with the

training sizes and additional variables as described in

the previous subsections. Furthermore, the evolution

for the non-weighted and the reweighted case is com-

pared.

Figure 2 shows an exponential increase in the exe-

cution time for the weighted and unweighted case for

an increasing number of spurious variables, as com-

pared to only a nearly linear increase for the size of

the data set. Hence, the impact of the number of

spurious variables is dominant over the impact of the

training set size. This is due to the combinatorial ex-

pansion of the feature space implied by the mapping

deﬁned in (2), whereas the linear increase is related to

the number of constraints. The issue will be covered

in more depth in Section 5. As mentioned, for appli-

cation purposes this is not a crucial drawback, as long

as the problem still ﬁts in memory.

4 APPLICATIONS

Two data sets from the UCI repository (Lichman,

2013) will be used to validate ICS*. The ﬁrst one is

the Mushroom Data Set, the second one is the Verte-

bral Column Data Set. For the Mushroom set, 90% of

the data was used for training and 10% for testing, di-

vided by random sampling. For the Vertebral Column

data set, two third of the data was used for training,

also randomly sampled. In both cases, important ef-

fects were selected by 10-fold CV on the training set,

after which the ﬁnal model was trained on the entire

training set.

Mushroom Data Set. The mushroom set includes

descriptions of 23 species of the Agaricus and Lep-

iota family (Duch et al., 1997). The aim is to clas-

sify them as either edible or poisonous based on 22

nominal attributes. 8124 samples were provided with

a class distribution of 51.8% edible and 48.2% poi-

sonous.

It should be noted that the cross-validation was

unanimous in the choice of effects to be selected for

the ﬁnal model. The obtained test AUC is 0.993. To

validate the quality of the model even further,it can be

compared to the optimal solution being offered with

the data set (Duch et al., 1997). Perfect separation

can be obtained using a set of four subsequent rules,

given in Table 4. The solution obtained with ICS*

corresponds exactly to the ﬁrst two rules, which are

responsible for 99.4% accurate classiﬁcation on the

set as a whole. The reason ICS* does not ﬁnd all four

rules is a limit imposed on its training AUC to avoid

trivial overﬁtting. This could be avoided by interac-

tively selecting this threshold based on the ROC char-

acteristics instead of using an automatic procedure.

Vertebral Column Data Set. This data set consists

of 310 observations with 6 real-valued biomechanical

Table 4: The optimal solution rules for the mushroom data

set.

Rule

1. odor = NOT(almond OR anise OR none)

2. spore-print-color = green

3. odor = none AND stalk-surface-below-ring = scaly

AND stalk-color-above-ring = brown

4. habitat = leaves AND cap-color = white

Interval Coded Scoring Index with Interaction Effects - A Sensitivity Study

0 1 2 4

-10 10 20 35

grade of slipping

0 -1 -2 -3

70 115 125 135

pelvic radius

0 -2 -3 -4 -3 -4

15 20 35 40 45 50

sacral slope

0.03

0.24

0.78

0.97

-7 -6 -5 -4 -3 -2 -1 0 1 2

Risk Proﬁle

Score

Risk

Figure 3: Model for the UCI Vertebral Column data set.

attributes (da Rocha Neto et al., 2011). Class labels

distinguish 100 ‘Normal’ from 210 ‘Abnormal’ pa-

tients (disk hernia or spondylolisthesis).

ICS* succeeds in deriving a simple model with

high performance. Three out of six variables are se-

lected as main effects and no interactions are discov-

ered. Two of the discovered effects were selected in

all ten folds of the resampling. The third one was cho-

sen in eight out of ten folds. The ﬁnal model is visual-

ized in Figure 3. Using ICS*, one obtains a test AUC

of 0.89 and a test accuracy of 81.7%. Earlier work

on this data set showed that performance can be in-

creased using rejection of data (da Rocha Neto et al.,

2011), up to a maximal accuracy of higher than 95%.

However, when not taking data rejection into account,

their result is only slightly higher than ours. They also

report on the classical SVM obtaining an accuracy of

85%. The results are difﬁcult to compare since they

perform multiple evaluations on the dataset using re-

sampling and average the result, whereas this paper

uses only a single train-test split. A more fair compar-

ison can be performed by applying established tech-

niques directly on the speciﬁc train-test split being

used here. For that reason, Least-Square Support Vec-

tor Machines (LS-SVM) with embedded hyperparam-

eter selection were trained and evaluated (Suykens

et al., 2002). LS-SVM with a linear kernel obtained a

test AUC of 0.88, whereas the use of an RBF kernel

resulted in a test AUC of 0.90. ICS* obtains a similar

performance as the LS-SVM approaches, whilst at the

same time offering a simple and interpretable model.

5 DISCUSSION

The sensitivity study ﬁrst showed that resampling can

be applied as a method to increase the robustness of

ICS and ICS*. Both detect the correct effects, includ-

ing the interaction in the case of ICS*, but sometimes,

other effects are included as well. The threshold for

robustness is set arbitrarily for the moment. More

elaborate techniques than thresholding should be used

to give a statistical justiﬁcation for the inclusion of an

effect. Moreover, there is an additional factor which

increases the complexity of the robustness problem.

Several times during the presentation of the results, it

was mentioned that a spuriously detected effect could

be tolerated since it could be included in an intended

effect, e.g. the interaction. This is due to the nature

of the model. Due to the additive formulation, Equa-

tion (3) is not strictly convex, leading to a non-unique

optimal solution. It might seem unsatisfactory from a

programmingpoint of view, but it leavesspace for dis-

cussion with medical practice, where, in the end, the

interpretation will take place. However, if one would

aim at uniqueness e.g. for repeated runs and com-

parisons of the resulting scoring systems, additional

steps should be taken. One possible approach works

by transformation of the problem. According to lit-

erature (Sra, 2006), a problem as (3) can be rewritten

such that the unique optimal solution will be the one

among the solutions of the original problem with the

smallest ℓ

-norm.

This transformation might also prove useful to al-

leviate the problems with execution time. An increase

in set size yields a same increase in data constraints

in Equation (3). On the other hand, adding extra

variables yields a combinatorial increase in the di-

mensionality of the feature space. Currently, the lin-

ear programming problem is solved using a standard

primal-dual approach. This explains the exponential

and linear dependencies shown in Figure 2. Yet, when

a dual algorithm could be applied, the dimension of

the feature space would become irrelevant. The use

of the ℓ

norm and the matrix D deﬁning the differ-

ences lead to a more difﬁcult entirely dual formulation

of the problem. The transformation proposed in (Sra,

2006) yields a standard quadratic program, which is

easier to consider in the dual space.

ICS* proved robust to noise. For low SNR, the

interaction effect was lost since it was obscured by

the noise. However, the noise itself did not inﬂuence

the model in the sense that no spurious effects were

introduced to try to include it.

The assessment of sensitivity with regard to set

size and number of spurious variables was positive.

Only in one case, an effect was missed. The cor-

rect detection of the effects for smaller data sets and

the gradual improvement of the model until saturation

when more data is available opens perspectives for

large-size problems. As sometimes applied in other

domains ﬁxed-size approximations could be consid-

ered (Suykens et al., 2002).

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods

Another aspect to be discussed is the selection of

the thresholds τ deﬁning the binary feature space. For

this paper, an automated approach was used, selecting

initial thresholds based on the quantiles of the data

distribution. In later stages, adjacent intervals thus

deﬁned are merged if their coefﬁcients are equal.

6 CONCLUSION

In this paper, ICS* was introduced as an extension

of ICS. It allows to infer relevant effects, including

interactions, from given data and construct a scoring

system by solving a minimization problem. After in-

troduction of the changes applied to ICS, ICS* was

subjected to a sensitivity study on synthetic data. The

study showed that resampling can be used to improve

the robustness of the method. Furthermore, it also

indicated robustness to noise, training set size and

the number of additional non-informative variables.

However, both set size and number of variables were

shown to have a large impact on execution time. Fi-

nally, ICS* was applied to two UCI data sets with

good results.

Future work will investigate the formulation of

a more advanced approach to the initial estimation

of the τ thresholds. A better estimation of the ﬁnal

thresholds from the beginning reduces the complexity

of the problem to be solved, since it relates directly to

the dimensionality of the expanded feature space.

Another goal is the formulation of the quadratic

transformation of ICS*. This would ensure the

uniqueness of the solution for a given data set. Fur-

thermore, row-action methods could be applied to

achieve a reduction of the execution time. More gen-

erally, approaches other than the LP, e.g. sparse inte-

ger solutions, could have interesting characteristics.

Finally, the problem to be solved is essentially

a combination of variable selection (sparsity on the

level of the original variables) and minimization of

the number of steps within each effect (sparsity on

the level of coefﬁcient differences). Such a combined

criterion can be tackled by methods as group sparse

LASSO (Simon et al., 2013) for fast convergence to

the optimal solution.

ACKNOWLEDGEMENTS

This research was supported by: Bijzonder Onder-

zoeksfonds KU Leuven (BOF), Center of Excellence

(CoE): PFV/10/002 (OPTEC); KULeuven IDO fund-

ing: #3E140722 Sensor-based Platform for the Accu-

rate and Remote monitoring of Kine(ma)tics Linked

to E-health (SPARKLE); Belgian Federal Science

Policy Ofﬁce: IUAP #P7/19/ (DYSCO, ‘Dynami-

cal systems, control and optimization’, 2012-2017).

VVB is a postdoctoral fellow of the Research Foun-

dation - Flanders (FWO).

REFERENCES

Chowriappa, P., Dua, S., and Todorov, Y. (2014). Introduc-

tion to machine learning in healthcare informatics. In

Machine Learning in Healthcare Informatics, pages

1–23. Springer.

da Rocha Neto, A. R., Sousa, R., de A. Barreto, G., and

Cardoso, J. S. (2011). Diagnostic of pathology on

the vertebral column with embedded reject option. In

Vitri, J., Sanches, J., and Hernndez, M., editors, Pat-

tern Recognition and Image Analysis, volume 6669 of

Lecture Notes in Computer Science, pages 588–595.

Springer Berlin Heidelberg.

Davenport, M., Duarte, M., Eldar, Y., Kutyniok, G., et al.

(2012). Compressed sensing: theory and applications.

Cambridge University Press Cambridge.

Duch, W., Adamczak, R., Grabczewski, K., Ishikawa, M.,

and Ueda, H. (1997). Extraction of crisp logical rules

using constrained backpropagation networks. In Proc.

of the European Symposium on Artiﬁcial Neural Net-

works (ESANN).

Jeong, B.-H., Koh, W.-J., Yoo, H., Um, S.-W., Suh, G. Y.,

Chung, M. P., Kim, H., Kwon, O. J., and Jeon,

K. (2013). Performances of prognostic scoring sys-

tems in patients with healthcare-associated pneumo-

nia. Clinical Infectious Diseases, 56(5):625–632.

Lichman, M. (2013). UCI machine learning repository.

http://archive.ics.uci.edu/ml. last accessed 20/5/2015.

Mounzer, R., Langmead, C. J., Wu, B. U., Evans, A. C.,

Bishehsari, F., Muddana, V., Singh, V. K., Slivka, A.,

Whitcomb, D. C., Yadav, D., Banks, P. A., and Pa-

pachristou, G. I. (2012). Comparison of existing clin-

ical scoring systems to predict persistent organ failure

in patients with acute pancreatitis. Gastroenterology,

142(7):1476 – 1482.

Simon, N., Friedman, J., Hastie, T., and Tibshirani, R.

(2013). A sparse-group lasso. Journal of Computa-

tional and Graphical Statistics.

Sra, S. (2006). Efﬁcient large scale linear programming

support vector machines. In ECML 2006, pages

767–774, Berlin, Germany. Max-Planck-Gesellschaft,

Springer.

Suykens, J. A., Van Gestel, T., De Brabanter, J., De Moor,

B., Vandewalle, J., Suykens, J., and Van Gestel, T.

(2002). Least squares support vector machines, vol-

ume 4. World Scientiﬁc.

Ustun, B., Trac, S., and Rudin, C. (2013). Supersparse lin-

ear integer models for predictive scoring systems. In

Proceeding of the 27th AAAI Conference on Artiﬁcial

Intelligence (AAAI-13), pages 128–130.

Van Belle, V., Van Calster, B., Timmerman, D., Bourne,

T., Bottomley, C., Valentin, L., Neven, P., Van Huf-

Interval Coded Scoring Index with Interaction Effects - A Sensitivity Study

fel, S., Suykens, J. A. K., and Boyd, S. (2012). A

mathematical model for interpretable clinical decision

support with applications in gynecology. PLoS ONE,

7(3):e34312.

Vapnik, V. (1995). The Nature of Statistical Learning The-

ory. Springer-Verlag New York, Inc.

Yang, H.-I., Yuen, M.-F., Chan, H. L.-Y., Han, K.-H., Chen,

P.-J., Kim, D.-Y., Ahn, S.-H., Chen, C.-J., Wong, V.

W.-S., and Seto, W.-K. (2011). Risk estimation for

hepatocellular carcinoma in chronic hepatitis b (reach-

b): development and validation of a predictive score.

The Lancet Oncology, 12(6):568 – 574.

ICPRAM 2016 - International Conference on Pattern Recognition Applications and Methods