vector of morphometric characteristics.
In our problem of credit scoring, the bank has to
predict the behavior of borrowers to pay back loan,
on the basis of variables description. For this second
example, the subpopulations result from differences
elsewhere: customers and not-customers. These
differences could influence (in addition to covariates)
the target variable. It is obvious that informations
related to customers are more reliable than those
related to not-customers. For example, the debt ratio
and expenditures may be underestimated among the
not-customers when requesting the loan.
An other example in credit scoring is when
subpopulations result from changes over time.
In this case, a first discriminant rule predicting
borrowers behaviour classes is built. Such a rule is
derived from the observation of borrowers over a
time interval [T,T + 1] (as from a population Ω). In
addition, when these individuals are observed again
over a new interval [T + τ,T + τ + 1] of the same
length (as from population Ω
∗
), another allocation
rule is often necessary.
Obviously, changes in the economic and social
environments could induce significant changes in
the population of borrowers and could affect the risk
credit.
As pointed out in (Tuffery, 2007), implementation
of an allocation rule devoted to the prediction of the
risk classes requires stability in the studied population
and in the distribution of the available covariates.
In the issue that we study, the two subpopulations
are not exchangeable i.e., there is an experienced
rule defined on a first subpopulation and a learning
sample of small size from some different second one.
Here, by allocation rule we mean a deci-
sion function Ψ
θ
= (ψ
θ1
,... ,ψ
θg
) (R
d
→ R
g
)
such that x ∈ R
d
is allocated to class with label
k
0
= argmax
k=1,...,g
ψ
θk
(x) where θ is the associated
parameter.
Usually, ψ
θk
(x) is a posterior probability to belong in
the class k or more generally, a corresponding score
(as the Anderson score, for example).
Hence, given a decision function or a classifier ψ
θ
,
one could consider that the experienced discriminant
rule on Ω is given unless we have the estimate
b
θ.
Then, the only remaining problem is to estimate the
parameter θ
∗
corresponding to the discriminant rule
on Ω
∗
.
Usually, two classical approaches are used to
obtain an estimation of θ
∗
: The first approach consists
in taking the same estimate than in Ω i.e.,
b
θ
∗
=
b
θ and
a the second approach in determining
b
θ
∗
using only
the learning sample S
∗
⊂ Ω
∗
.
If we denote by ν the number of components of θ
∗
,
one could present the first approach as leading to the
estimate
b
θ
∗
= g
1
(
b
θ) where g
1
= Id
ν
(R
ν
7→ R
ν
) and
the second one as leading to the estimate
b
θ
∗
= g
2
(S
∗
)
with g
2
(R
Card(S
∗
)×ν
7→ R
ν
).
The first approach does not take into account
the difference between the two subpopulations. The
second one needs a learning sample of a sufficient
size and here we deal with the problem of a small
one. This raises the problem of accuracy of the
estimate
b
θ
∗
= g
2
(S
∗
).
Thus, the problem here, is to take account of the char-
acteristics of the available sample as recommended
rightly by David Hand (Hand, 2005). He noted that
the advantage of an advanced method of modelling
relatively to a simple one (linear, for example) is
often in a better modeling of the study sample.
To circumvent the problem of specific data, we ex-
ploit the idea that information related to one of the two
subpopulations contains some information related to
the other. Thus, we search an acceptable relationship
between the two available distributions (i.e., the dis-
tribution of covariates on Ω and this one on Ω
∗
).
The relationship between distributions of covariates
on Ω and Ω
∗
induces a parametric relationship θ
∗
=
Φ
γ
(θ) between parameters.
The estimation method to derive θ
∗
is a plug in one
i.e., given the link function Φ
γ
and considering θ =
b
θ,
we use the learning sample S
∗
to estimate γ. The esti-
mate depends now on S
∗
and θ i.e.,
b
θ
∗
= Φ
b
γ(S
∗
)
(
b
θ) = g (
b
θ,S
∗
). (1)
The problem of the smallness of the sample S
∗
arises
again when estimating γ. However, the number of
components of γ should be much lower than those of
θ
∗
. Hence, this could be well appropriate.
In the case of the Gaussian mixture model, this
plug in approach appears very promising. In (Bier-
nacki et al., 2002) we introduced a somewhat simi-
lar plug in method to build a generalized discriminant
rule devoted to prediction on a Gaussian subpopula-
tion (i.e., the restriction of the covariates vector is a
Gaussian per class), learning on another one.
In this work, we extend this idea to the logistic dis-
criminant model i.e., for each of the two subpopu-
lations the response variable depends on covariates
ICAART 2009 - International Conference on Agents and Artificial Intelligence
268