UPDATING A LOGISTIC DISCRIMINATION RULE

Comparing Some Logistic Submodels in Credit-scoring

Farid Beninel† and Christophe Biernacki‡

†CREST-ENSAI & UMR 6086, Campus de Ker Lann, rue Blaise Pascal, 35170 Bruz, France

‡Universit

e Lille1, UFR de Math

ematiques & UMR 6524, 59655 Villeneuve d’Ascq, France

Keywords:

Credit scoring, Discriminant rule, Error rate, Learning sample, Logistic model, Misclassiﬁcation rate, Gener-

alized discrimination, Updating a discriminant rule, Subpopulations mixture, Supervised classiﬁcation.

Abstract:

Often a discriminant rule to predict individuals from a certain subpopulation is given, but the individuals to

predict belong to another subpopulation. Two distinct approaches are usually implemented. The ﬁrst approach

is to apply the same discriminant rule for the two subpopulations. The second approach is to estimate a new

rule for the second subpopulation. The ﬁrst classical approach does not take into account differences between

subpopulations. The second approach is not reliable in cases of few available individuals from the second

subpopulation. In this paper we develop an intermediate approach: we get a rule to predict in the second

population combining the experienced rule of the ﬁrst population and the available learning sample from the

second. Different models combining the ﬁrst rule and the labeled sample from the second population are

estimated and tested.

1 INTRODUCTION

Given a categorical target variable and a set of

covariates, we deal with the issue of predictive

discrimination in the context of a mixture of two

subpopulations. More precisely, we have to construct

a rule that assigns individuals from one subpopula-

tion, to one of prespeciﬁed set of classes based on

a vector of measurements (or covariates) taken on

those individuals. The available data consist in small

learning sample from the subpopulation to predict

and a discriminant rule on the second one.

Such a problem arises in various ﬁelds of ap-

plication. The particular problem which motivates

this work concerns the prediction of some particular

borrowers behavior in credit-scoring. Here, the con-

cerned particular borrowers are not customer of the

bank where the loan is demanded. Hence, in the ﬁrst

subpopulation borrowers are customers and in the

second subpopulation borrowers are not customers.

The behavior is given by the target variable with

creditworthy and not-creditworthy as the prespeciﬁed

classes.

This work extends a realized work (Biernacki

et al., 2002) related to prediction of gender of birds

given their morphometric measures and generalizing

the Gaussian predictive discrimination method. In

such an application individuals are seabirds from

Calanectris Diomedea species and the mixture of the

two subpopulations results from subspecies Borealis

and Diomedea distinguished by their geographical

distribution (Thibault et al., 1997), (Bretagnolle et al.,

1998).

As it is well known that the geographical location

affects on measures of size (Zink and Remsen, 1986),

the use of classical predictive discrimination methods

for predicting gender of birds from different locations

is unreliable. Here, by classical methods, we mean

methods based on pure models (i.e., models more

adapted for an homogeneous population): Gaussian

discriminant analysis, logistic discriminant analysis,

neural networks, classiﬁcation and regression trees. . .

It is therefore necessary to have a good dis-

criminant method which takes into account the

geographical location. So, in (Biernacki et al.,

2002), we introduced a discriminant rule based on a

Gaussian mixture model associated with the design

267

Beninel F. and Biernacki C. (2009).

UPDATING A LOGISTIC DISCRIMINATION RULE - Comparing Some Logistic Submodels in Credit-scoring.

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence, pages 267-274

DOI: 10.5220/0001662302670274

 SciTePress

vector of morphometric characteristics.

In our problem of credit scoring, the bank has to

predict the behavior of borrowers to pay back loan,

on the basis of variables description. For this second

example, the subpopulations result from differences

elsewhere: customers and not-customers. These

differences could inﬂuence (in addition to covariates)

the target variable. It is obvious that informations

related to customers are more reliable than those

related to not-customers. For example, the debt ratio

and expenditures may be underestimated among the

not-customers when requesting the loan.

An other example in credit scoring is when

subpopulations result from changes over time.

In this case, a ﬁrst discriminant rule predicting

borrowers behaviour classes is built. Such a rule is

derived from the observation of borrowers over a

time interval [T,T + 1] (as from a population Ω). In

addition, when these individuals are observed again

over a new interval [T + τ,T + τ + 1] of the same

length (as from population Ω

∗

), another allocation

rule is often necessary.

Obviously, changes in the economic and social

environments could induce signiﬁcant changes in

the population of borrowers and could affect the risk

credit.

As pointed out in (Tuffery, 2007), implementation

of an allocation rule devoted to the prediction of the

risk classes requires stability in the studied population

and in the distribution of the available covariates.

In the issue that we study, the two subpopulations

are not exchangeable i.e., there is an experienced

rule deﬁned on a ﬁrst subpopulation and a learning

sample of small size from some different second one.

Here, by allocation rule we mean a deci-

sion function Ψ

= (ψ

θ1

,... ,ψ

θg

) (R

→ R

)

such that x ∈ R

is allocated to class with label

= argmax

k=1,...,g

θk

(x) where θ is the associated

parameter.

Usually, ψ

θk

(x) is a posterior probability to belong in

the class k or more generally, a corresponding score

(as the Anderson score, for example).

Hence, given a decision function or a classiﬁer ψ

one could consider that the experienced discriminant

rule on Ω is given unless we have the estimate

θ.

Then, the only remaining problem is to estimate the

parameter θ

∗

corresponding to the discriminant rule

on Ω

∗

Usually, two classical approaches are used to

obtain an estimation of θ

∗

: The ﬁrst approach consists

in taking the same estimate than in Ω i.e.,

∗

θ and

a the second approach in determining

∗

using only

the learning sample S

∗

⊂ Ω

∗

If we denote by ν the number of components of θ

∗

one could present the ﬁrst approach as leading to the

estimate

∗

= g

(

θ) where g

= Id

7→ R

) and

the second one as leading to the estimate

∗

= g

∗

)

with g

Card(S

∗

)×ν

7→ R

The ﬁrst approach does not take into account

the difference between the two subpopulations. The

second one needs a learning sample of a sufﬁcient

size and here we deal with the problem of a small

one. This raises the problem of accuracy of the

estimate

∗

= g

∗

Thus, the problem here, is to take account of the char-

acteristics of the available sample as recommended

rightly by David Hand (Hand, 2005). He noted that

the advantage of an advanced method of modelling

relatively to a simple one (linear, for example) is

often in a better modeling of the study sample.

To circumvent the problem of speciﬁc data, we ex-

ploit the idea that information related to one of the two

subpopulations contains some information related to

the other. Thus, we search an acceptable relationship

between the two available distributions (i.e., the dis-

tribution of covariates on Ω and this one on Ω

∗

The relationship between distributions of covariates

on Ω and Ω

∗

induces a parametric relationship θ

∗

(θ) between parameters.

The estimation method to derive θ

∗

is a plug in one

i.e., given the link function Φ

and considering θ =

θ,

we use the learning sample S

∗

to estimate γ. The esti-

mate depends now on S

∗

and θ i.e.,

∗

= Φ

γ(S

∗

)

(

θ) = g (

θ,S

∗

). (1)

The problem of the smallness of the sample S

∗

arises

again when estimating γ. However, the number of

components of γ should be much lower than those of

∗

. Hence, this could be well appropriate.

In the case of the Gaussian mixture model, this

plug in approach appears very promising. In (Bier-

nacki et al., 2002) we introduced a somewhat simi-

lar plug in method to build a generalized discriminant

rule devoted to prediction on a Gaussian subpopula-

tion (i.e., the restriction of the covariates vector is a

Gaussian per class), learning on another one.

In this work, we extend this idea to the logistic dis-

criminant model i.e., for each of the two subpopu-

lations the response variable depends on covariates

ICAART 2009 - International Conference on Agents and Artificial Intelligence

268

according to a logistic model. θ and θ

∗

are respec-

tively the vectors of covariates (including intercept)

effect. The link between the two subpopulations con-

sists in a direct relationship between the parameters

i.e., θ

∗

= Φ

(θ).

Given the function Φ

, each system of constraints on

γ generates a logistic submodel. We focus on the esti-

mation of each logistic submodel and the comparison

of some of these submodels from the error-cost point

of view.

2 GENERALIZED LOGISTIC

DISCRIMINATION

2.1 The Logistic Model

Let x ∈ R

be a vector of covariates and an associated

response variable Y ∼ M

(1,π

,. .. ,π

) where g ≥ 2.

Let us set t

(x,θ) = P(Y = k|x; θ), with

θ = {(β

||β

) ∈ R

d+1

,k = 1, .. ., g}. Here, (β

||β

)

is the concatenation of the k

intercept and the k

vector of covariates effect.

The multinomial logistic model is deﬁned by the

generalized logit given by the following equation

log



(x,θ)



= β

+ β

x. (2)

Equivalently, the model is deﬁned by the probability

distribution of Y

, given by

(x,θ) =

exp(β

+ β

1 +

∑

g−1

j=1

exp(β

0 j

+ β

, k = 1,. .. ,g

(3)

and where (β

||β

) is the null vector of R

d+1

The discriminant rule based on this probabilistic

model, in the case of uniform errors cost, consists

in assigning the observation x ∈ R

to the group

= argmax

k=1,...,g

(x,θ).

For the general case, including non uniform errors

cost, k

= argmin

l=1,...,g

{

∑

k=1

C(k|l)t

(x,θ)}, where

C(k|l) is the misallocation cost value, when assigning

an observation from class {Y = l} to class {Y = k}.

The aim of this communication is the study and

comparison of some logistic submodels (or con-

strained logistic models) resulting from situations

where one has an experienced rule to predict on a ﬁrst

subpopulation, a small learning sample from the sec-

ond which contains the individuals to predict.

2.2 The Logistic Mixture Model

Let us denote

- Ω, Ω

∗

two subpopulations from a same population

and p, p

∗

the associated prior probabilities,

X ∈ R

the covariates vector observed over the

disjoint union Ω t Ω

∗

Y a categorical target variable.

We set (

Y )

|Ω

= (X,Y ) and (

Y )

|Ω

∗

= (X

∗

) and

denote by (x,y), (x

∗

) their respective values.

Here, we consider the logistic model, over Ω, as given







Y ∼ M

(1,π

,. .. ,π

(x,θ) =

exp(β

+β

∑

g−1

j=1

exp(β

0 j

+β

(4)

and over Ω

∗

, by







∗

∼ M

(1,π

∗

,. .. ,π

∗

,θ

∗

) =

exp(β

∗

+β

∗

)

∑

g−1

j=1

exp(β

∗

0 j

+β

∗

)

(5)

where t

∗

,θ

∗

) = P(Y

∗

= k|x

∗

;θ

∗

Here we deﬁne the logistic mixture model as fol-

lows:

for

x ∈ R

, P(

Y = k|

x) = pt

(

x,θ) + p

∗

(

x,θ

∗

(6)

When the subpopulation of an observation ω such

that

X(ω) =

x is unknown, its allocation (to the

appropriate class) requires parameters p, p

∗

,θ, θ

∗

In the problem we solve in this paper, we have

to predict individuals from Ω

∗

and so P(

Y = k|

x) =

∗

(

x,θ

∗

). Consequently, we have to use an allocation

rule which requires the only parameter θ

∗

2.3 Generalized Logistic Discrimination

Different problems of discrimination under the logis-

tic mixture model, could be studied. The resolution

of these problems depends on the relevance of the

available data. Particularly, we identify two problems:

A ﬁrst problem is the simultaneous estimation

of θ and θ

∗

. A second problem is to estimate θ

∗

situations of a given θ.

The resolution of the ﬁrst problem requires two

learning samples of sufﬁcient size (one sample from

each subpopulation). While the second problem

requires only one of these samples, and this is the

problem we have to study.

UPDATING A LOGISTIC DISCRIMINATION RULE - Comparing Some Logistic Submodels in Credit-scoring

269

Speciﬁcally, we have already an allocation rule on

Ω (i.e., we have θ or more usually, a given estimate

of θ) and we want to get a new rule to predict on Ω

∗

(i.e., to estimate θ

∗

) with as available data to estimate

a sample S

∗

= {(x

∗

) : i = 1,. .. ,n

∗

The practice has resulted in the two following

cases:

case1. we have a unique population (i.e., we con-

sider Ω = Ω

∗

and therefore θ = θ

∗

case2. we detect the mixture i.e., we consider Ω 6=

Ω

∗

and we estimate θ

∗

using only S

∗

Finally, in these usual practices, it is believed to know

everything (case 1) or nothing on Ω

∗

(case 2). In

real problems links between subpopulations could ex-

ist and consequently, informations on Ω could provide

some information on Ω

∗

3 LINKS BETWEEN

SUBPOPULATIONS

3.1 Linear Links Models

In this work, we limit the study to the models deﬁned

by a linear relationship between parameters θ

∗

and θ

i.e., for all k = 1,. ..,g − 1,

∗

= α

+ β

, β

∗

= Λ

, (7)

where α

∈ R and Λ

is a d × d diagonal matrix (or a

d dimensional vector).

Replacing β

∗

and β

∗

in Equation (5) by their val-

ues given by the Equation (7), we obtain the new pa-

rameterisation

∗

,θ, γ) =

exp(β

+ α

+ β

∗

)

1 +

∑

g−1

j=1

exp(β

0 j

+ α

+ β

∗

)

, (8)

where γ = {(α

||Λ

) ∈ R

d+1

: k = 1, .. ., g − 1}.

As it will be seen in subsection 3.2, linear link

models deﬁned by Equations (7) are those obtained

when the random vectors X

|Y =k

, k = 1,. ..,g (resp.

∗

, k = 1, .. ., g ) are Gaussian homoscedastic.

The constrained situation where for all k, α

= 0

and Λ

= I

(the d-dimensional identity matrix),

returns case1 of the classical approach.

The situation where α

and Λ

are unconstrained,

returns case2.

We will compare these two classical situations to

intermediate parsimonious models. Thus, the purpose

of this communication is the estimation and the com-

parison of the models listed below:

(M1 ≡ case1) α

= 0 and Λ

= I

for all 1 ≤ k ≤

g − 1. The score functions are invariable.

(M2) α

= 0 and Λ

= λ

with λ

∈ R. Each

score function (corresponding to a ﬁxed class)

changes w.r.t. λ

. The ranks corresponding to in-

dividual scores are invariant.

(M3) α

∈ R and Λ

= I

. The score functions

differ only w.r.t. the intercept and thus, changes

the threshold for assignment to classes. The dif-

ferences between scores and the corresponding

ranks are invariable.

(M4) α

∈ R and Λ

= λ

. Here the ranking of

the scores is invariable.

(M5) α

= 0 and Λ

∈ R

; the threshold is invari-

able but covariates coefﬁcients could change.

(M6 ≡ case2) α

∈ R and Λ

∈ R

. All parame-

ters are free.

If we denote by ≺ the symbol of nesting between

models, we establish the partial ranking M1 ≺ M2 ≺

M5 ≺ M6 and M1 ≺ M3 ≺ M4 ≺ M6.

These relations are used to compare models with in-

formation criteria as the Schwarz criterion (BIC) or

the Akaike one (AIC) (Lebarbier and Mary-Huard,

2006).

3.2 Results from Homoscedastic

Gaussian Model

For each subpopulation the design vector is a mixture

of homoscedastic Gaussian distributions i.e.,

∀k = 1,. .. ,g X

∼ N

(µ

,Σ) and X

∗

∼ N

(µ

∗

,Σ

∗

The link between Ω and Ω

∗

is given by

∗

= D

+ b

, k = 1,. .. ,g, (9)

where D

is a diagonal real matrix. It’s known from

(De Meyer et al., 2000) that the link using a linear

function is the only link φ

= (φ

,. .. ,φ

) such that

∗

= φ

) and which veriﬁes the assumptions A1

and A2 that follow.

A1 : φ

is a component to component link i.e.,

function φ

k j

7→ R) transforms the only j

component. Hence, we consider φ

k j

as an (R 7→

R) function.

A2 : φ

k j

is a C

function.

ICAART 2009 - International Conference on Agents and Artificial Intelligence

270

We derive from the Equation (9) the following rela-

tions between parameters of the two Gaussian distri-

butions.

∗

= D

+ b

, Σ

∗

= D

ΣD

. (10)

The matrices D

, allowing equal variances (see.

Equation (10)), are such that D

= A

D with D a

diagonal matrix and A

another diagonal matrix with

diagonal components in {−1,+1}.

We consider a link model as given by a set

{D,A

,. .. ,A

,. .. ,b

It is well known that there exists a particular link

between parameters of a generating Gaussian mixture

model and those of a the corresponding logistic one

(Anderson, 1982): for k = 1,. .. ,g, note f

the den-

sity function of the Gaussian distribution N

(µ

,Σ),

we have the Bayes formulae

P(Y = k|x) =

(x)

∑

j=1

(x)

. (11)

We derive the generalized logit (where g is the refer-

ence category)

log(

P(Y =k|x)

P(Y =g|x)

) = (µ

− µ

)

−1

x + log(

)

(kµ

−1

− kµ

−1

(12)

Consequently, the parameters given by the follow-

ing equations







= log(

) +

(kµ

−1

− kµ

−1

= Σ

−1

(µ

− µ

(13)

are logistic parameters (corresponding to the intercept

and covariates effect).

In an analogous manner, the parameters of the lo-

gistic model derived from the Gaussian subpopulation

Ω

∗

, are











∗

= log(

∗

) +

(kµ

∗

∗−1

− kµ

∗

∗−1

∗

= Σ

∗−1

(µ

∗

− µ

∗

(14)

Using Equations (10) and setting D

= A

and b

= b, we establish for the model (of link)

(D,A

,. .. ,A

,. .. ,b

), the equations returning the

link between parameters of the logistic models corre-

sponding to the two subpopulations. More precisely







∗

= β

+ α

∗

= A

Dβ

(15)

where α

= α(µ

,µ

,Σ, b,π

∗

,π

∗

) = log(

∗

)+ <

−1

b >

−1

− < µ

−1

b >

−1

4 PARAMETERS ESTIMATION

4.1 The Maximum Likelihood Method

The problem now is to estimate the parameters

γ = (α

,. .. ,α

g−1

,Λ

,. .. ,Λ

g−1

) involved in Equation

(8) giving, for an individual from Ω

∗

, the correspond-

ing probabilities of belonging to classes. The estima-

tion is based on sample S

∗

= {(Y

∗

) : i = 1,. .. ,n

∗

We use the maximum likelihood estimator. The

conditional maximized likelihood is

∗

(γ) =

∗

∏

i=1

∏

k=1

∗

,θ, γ)

, (16)

where Z

= 1 if Y

∗

= k and 0 elsewhere.

That is to maximize the log-likelihood expressed by

the equation

∗

(γ) =

∑

i:Z

log

1 +

∑

g−1

j=1

exp(h

∗

))

∑

g−1

k=1

∑

i:Z

log



exp(h

∗

))

∑

g−1

j=1

exp(h

∗

))



(17)

where h

is the k

Anderson score i.e.,

∗

) = β

∗

+ β

∗

= β

+ α

+ β

∗

According to the constraints imposed on γ, it leads

to a non-linear equations system. Table 1 gives for

each model (or each corresponding equations system)

the number of unknown parameters to estimate.

Table 1: Here, ν is the dimension of the estimated parameter

γ.

model M2 M3 M4 M5 M6

ν g − 1 g − 1 2g − 2 dg − d dg

In (Beninel and Biernacki, 2007), we give for the

case g = 2 , the system of likelihood equations, the

corresponding Hessian and condition of the unique-

ness of the solution. Here we treat the more com-

plex case g > 2, leading to a more complex non-linear

equations system, but without more difﬁculties from

the mathematical point of view.

UPDATING A LOGISTIC DISCRIMINATION RULE - Comparing Some Logistic Submodels in Credit-scoring

271

4.2 Using an Avalaible Logistic

Procedure

The estimation method could be reduced to the use of

an existing logistic procedure as the proc LOGISTIC

in SAS system. We present such a technique in the

context of a dichotomic response variable.

The unique Anderson score is

h(x

∗

) = β

+ α +

∑

j=1

(β

∗ j

), (18)

where λ

is the j

diagonal component of matrix Λ,

and x

∗ j

respectively the j

component of β and of

the design vector x

∗

Let us set β

∗

= β

+ α and

∗

= β ∗ x

∗

the vec-

tor obtained using a component to component product

(or, Hadamard product) and

∗ j

its j

component.

∗

is the new design vector and we can view

∗

the j

weighted covariate. The score function given

by equation (18) is now written as

h(x

∗

) = β

∗

∑

j=1

∗

, (19)

Consequently, for each model among M2, .. .,M6

we have to estimate (using an available logis-

tic procedure) γ = (β

∗

,λ

,. .. ,λ

) ∈ Γ ⊂ R

d+1

on the basis of the transformed learning sample

∗

= {(Y

∗

) : i = 1,...,n

∗

Depending on the model, the dimension of the pa-

rameters space Γ is variable. We explicit for each

model the numerical computation via the LOGISTIC

procedure.

We set z =

∑

j=1

∗ j

, corresponding to the Anderson

score related to the logistic model on Ω. We give in

the following the Anderson score on Ω

∗

, depending

on the transformed data.

M2 : We have to estimate the parameter λ ∈ R

such that

h(x

∗

) = β

+ λz, (20)

Here, the intercept is ﬁxed as equal to β

M3 : We estimate the intecept β

∗

∈ R i.e.,

h(x

∗

) = β

∗

+ z. (21)

Here the effect of the covariate Z is constrained to

be equal to one.

M4 : We estimate (β

∗

,λ) ∈ R

such that

h(x

∗

) = β

∗

+ λz. (22)

We have to use the available logistic procedure

without constraints.

M5 : We have to estimate Λ

∗

∈ R

such that

h(x

∗

) = β

+ Λ

∗

. (23)

The intercept is constrained to be equal to β

Here we consider model (M1) as the simplest model

and model (M6) as the more complex model.

When the response variable is polytomic (g > 2),

the number of possible constrained models is much

larger. For example for g = 3, we identify 15 sub-

models to compare.

5 NUMERICAL EXPERIMENTS

5.1 Data Description

The data are from a German bank and cover a sample

of 1000 consumer’s credits. Each of these consumer

is described by a binary response variable Kredit

({Kredit = 1} for credit-worthy or {Kredit = 0}

for not credit-worthy). In addition, 20 covariates

of different types (continuous, nominal, ordinal) as-

sumed to inﬂuence creditability are recorded. Exam-

ples of these covariates are:

Hoehe: the amount of credit in ”Deutche Mark”

[metrical],

Laufzeit: duration of credits in months [metri-

cal],

Laufkont: account balance [categorical],

Moral: behaviour repayment of other loans [cate-

gorical]. . .

For a complete access to these data we refer to the

book (Fahrmeir and Hamerle, 1984) or the current

website http://www.stat.uni-muenchen.de/

service/datenarchiv/Kredit.

These data are also described in the book of

(Fahrmeir and Tutz, 1994) (see. pages 31–34). The

prior probabilities corresponding to the categories

of the response variable are structurally unbal-

anced.Thus, for a consistent estimation of the logit

model the given sample is stratiﬁed (300 consumers

such that {Kredit = 0} and the remaining 700

consumers such that {Kredit = 1}.

These data are frequently used by specialists of

credit scoring when testing, calibrating and compar-

ing methods.

ICAART 2009 - International Conference on Agents and Artificial Intelligence

272

5.2 Covariates Selection

In order to evaluate the data quality and the inﬂuence

of the covariates (on creditability) a primary data pro-

cessing is realized including univariate and bivariate

statistics. Bivariate statistics measuring the depen-

dency between the selected covariates and the target

variable are computed.

This primary data processing highlights categories of

covariates with null frequencies of responses 0 or 1 of

the target variable. Such a characteristic in data cre-

ates the separability which implies divergence of the

likelihood maximization algorithm, when estimating

the parameters of the logistic model. In such a situ-

ation these categories are combined with close cate-

gories of the same covariate.

In addition, the primary data allows to determine co-

variates which inﬂuence the target variable. Thus,

logistic regression is tested with different combina-

tions of covariates among those appearing as jointly

inﬂuencing variable kredit. Like other authors, who

worked on these data, the more inﬂuencing covari-

ates are those introduced in section 5.1 and variables

Beszeit (present employement since) and Sparkont

(savings account).

Apart from Sparkont , the other covariates are un-

changed. In effect, to avoid the separability situation,

categories 4 and 5 of Sparkont are grouped together.

5.3 Subpopulations Deﬁnition

We use the variable Laufkont to carry out the separa-

tion in two subpopulations. The non-customers of the

bank(Laufkont = 1 ) constitute a subpopulation and

the customers (Laufkont > 1) constituting a second

subpopulation. Although laufkont is a covariate, we

use it to deﬁne the two subpopulations and avoid bias

of the difference in the amount and reliability of data

related to the two subpopulations.

5.4 Experiments Description

We implement in SAS the program hat manages data

and estimate the models. The implementation of these

models extending logistic regression is as follows:

Step 1: We apply the logistic regression with as pri-

mary data the design matrix related to customers (or

the learning sample S

⊂ Ω). The obtained estimate

θ is used to compute the new covariates subject to

step2: The continuous covariates are multiplied by

corresponding component of θ and binary covariates

(or categories of qualitative variables) are multiplied

by the corresponding parameter.

Step 2: The second step consists in the estimation

of parameters related to models M2 to M6. Such

an estimation is based on a learning sample S

∗

∈ Ω

∗

Sample S

∗

is derived (from non-customers) using the

surveyselect procedure to obtain a stratiﬁed ran-

dom sample. Percentages of responses {kredit = 1}

and {kredit = 0} are close to that ones of Ω

∗

Given the sample size, the simulations are to draw

B samples S

∗

from non-customers to estimate the 5

models and for each of S

∗

corresponds a test sample

∗

of the remaining non-customers.

Let C (l|k) denote the cost of misallocation of a bor-

rower from {Y

∗

= k} into {Y

∗

= l}, k,l = 0, 1. Let us

set ρ(1,0) =

C(1|0)

C(0|1)

, corresponding to error costs ra-

tio. Under acceptable assumptions related to the prior

probabilities p

, p

and for each ﬁxed pair (S

∗

) we

estimate the exact error-rate of assignation to classes

given by

C(0|1)P(

∗

= 0|Y

∗

= 1) +C(1|0)P(

∗

= 1|Y

∗

= 0)

(or without loss, P(

∗

= 0|Y

∗

= 1) + ρ(0,1)P(

∗

1|Y

∗

= 0)).

For ﬁxed ρ(1,0), p1, p0 , we get at B = 30 iterations

a stratiﬁed random sample of frequencies (n

) (n

the number of responses {kredit = 0} in the learning

sample and n

the number of responses {kredit =

1}.

From Figure A.1, it appears that the ranks of BIC

values do not depends on ρ(1,0) values. Models M3,

M4 seem the best models (to generate data) among

M1,. .. ,M6.

Figure A.2, below gives the mean risk according to

the costs ratio. The sample size here is set at n

= 20.

For all values of ρ(1, 0), models M2 and M5 appear

the best ones, from the risk point of view. The most

practiced model M1, is very bad.

6 CONCLUSIONS

The simulations conﬁrm the difference between sub-

populations. It appears that the best models from the

cost point of view (M2 and M5), are not the generat-

ing best models. Indeed, models M3 and M4 mini-

mize the BIC criterion.

The models M2 and M5 are those who move the in-

tercept. This coincides with the fact that for stratiﬁed

samples (as here), instead of the estimation of the in-

tercept β

∗

, one estimates β

∗

± log (πn

/(1 − π)n

In this work, the use of estimated logistic discriminant

rule is simple from the programming point of view as

we adapt an existing SAS procedure.

UPDATING A LOGISTIC DISCRIMINATION RULE - Comparing Some Logistic Submodels in Credit-scoring

273

0.2 0.4 0.6 0.8 1.0

50 60 70 80 90 100

Cost report

mean BIC

Model M1

Model M2

Model M3

Model M4

Model M5

Model M6

Figure 1: Mean BIC value depending on ρ(1,0) value.

0.1 0.2 0.3 0.4 0.5 0.6

0.1 0.2 0.3 0.4 0.5

Cost report

mean risk

Model M1

Model M2

Model M3

Model M4

Model M5

Model M6

Figure 2: Mean risk value depending on ρ(1,0) value.

50 100 150 200

0.20 0.25 0.30 0.35

Sample size

Mean risk

Model M1

Model M2

Model M3

Model M4

Model M5

Model M6

Figure 3: Mean Risk value depending on the learning sam-

ple size value.

REFERENCES

Anderson, J. A. (1982). Logistic discrimination. In Hand-

book of Statistics (Vol. 2), P.R. Krishnaiah and L.

Kanal (Eds.). Amsterdam: North Holland, pages 169–

191.

Beninel, F. and Biernacki, C. (2007). Relaxations de la

egression logistique: mod

eles pour l’apprentissage

sur une sous-population et la pr

ediction sur une autre.

RNTI, A1:207–218.

Biernacki, C., Beninel, F., and Bretagnolle, V. (2002). A

generalized discriminant rule when training popula-

tion and test population differ on their descriptive pa-

rameters. Biometrics, 58:387–397.

Bretagnolle, V., Genevois, F., and Mougeot, F. (1998). Intra

and intersexual function in the call of a non passerine

bird. Behaviour, 135:1161–1202.

De Meyer, B., Roynette, B., Vallois, P., and Yor, M. (2000).

On independent times and positions for brownian mo-

tion. Institut Elie Cartan, 1.

Fahrmeir, L. and Hamerle, A. (1984). Multivariate statis-

tiche Verfahren. De Gruyter, Berlin.

Fahrmeir, L. and Tutz, G. (1994). Multivariate Statisti-

cal Modelling Based on Generalized Linear Models.

Springer Series in Statistics. Springer-Verlag, New

York.

Hand, D. J. (2005). Classiﬁer technology and the illusion of

progress. Technical Report, Imperial college, London.

Lebarbier, E. and Mary-Huard, T. (2006). Une introduc-

tion au crit

ere bic : fondements th

eoriques et in-

terpr

etation. JSFDS, 147(1):39–57.

Thibault, J. C., Bretagnolle, V., and Rabouam, C. (1997).

Cory’s shearwater calonectris diomedia. Birds of

Western Paleartic Update, 1:75–98.

Tuffery, S. (2007). Am

eliorer les performances d’un

mod

ele pr

edictif: perspectives et r

ealit

e. RNTI,

A(1):42–74.

Zink, R. and Remsen, J. (1986). Evolutionary processes

and patterns of geographic variation in birds. Current

Ornithology, 4:1–69.

ICAART 2009 - International Conference on Agents and Artificial Intelligence

274