that the rith cluster describes the density of the class
of the kth data,
=
. The proposed
approach estimate the second term by non-
parametric estimation of the probability densities of
each class in each cluster (r
i
), described in the next
section.
3 PROBABILITY ESTIMATION
The estimation of local probability densities for each
class in each cluster is based on the original
Probabilistic Neural Network (PNN). PNN (Bishop,
1995) is a network formulation of probability
density estimation. A PNN consists of several sub-
networks, each of which is a Parzen window PDF
estimator for each of the classes. The input nodes are
the set of measurements. The second layer consists
of the Gaussian functions formed using the given set
of training data points as centres. The third layer
performs an average operation of the outputs from
the second layer for each class. The fourth layer
performs a vote, selecting the largest value. The
associated class label is then determined. The PNN
is a classifier version, which combines the Baye’s
strategy for decision-making with a non-parametric
estimator for obtaining the probability density
function (PDF).
4 DEFAULT PREDICTION
The sample data set comes from a state-owned
commercial bank. The original dataset contains 126
instances however 3 of these are omitted because
these are incomplete data, which is common with
other studies. The class distribution is 51% default
and 42% non-default. The 123 samples represent
Small and Medium Enterprises of only one state of
Brazil. Among these enterprises, the number of the
enterprises which could repay the loan is 60, the rest
63 are those which could not repay the loan.
Our model is an accounting based model. In this
kind of model, accounting balance sheets are used
and the input indexes include the enterprise’s
capability of returning loan and wish of returning
loan, and in this work the capability was analysed.
The capability of returning loan is measured by
several indexes that reflect the financial situation of
enterprise, such as profitable capability, operating
efficiency, repayment capability and situation of
enterprise’s cash flow, etc. Four accounting financial
ratios were chosen (these are the most common used
indexes). These are as follows:
X1 = Earnings before taxes / Average total assets
X2 = Total liabilities / Ownership interest
X3 = Operational cash flow / Total liabilities
X4 = Working capital / Total assets.
Each index represented the average of three
periods before the prediction period.
We limited the number of clusters in 5, in order
to maintain good interpretability. The best results are
obtained using three clusters. Therefore, our rule
base has three rules.
Some membership functions, related to variable
X1, obtained by fuzzy clustering are illustrated in
Figure 1. Clusters have different covariance
matrices, and they are diagonal matrices, in order to
project memberships on original variables. The
algorithm did not optimize clustering based on
interclass separability.
The estimated densities projected on the original
input variable X1 are illustrated by Figure 2. This
figure shows the densities related to variable X1 for
the two classes related to one cluster.
Figure 1: Membership functions related to variable X1 in
the three clusters (rules).
The performance of the obtained classifiers was
measured by leave-one-out cross validation As the
name suggests, leave-one-out cross-validation
(LOOCV) involves using a single observation from
the original sample as the validation data, and the
remaining observations as the training data. This is
repeated such that each observation in the sample is
used once as the validation data.
The results are summarized in Table 1. We
report the mean values of the error. For the proposed
approach, Model 1, the number of rules is equal to
the number of clusters. The grid partition approach,
Model 3, has 20 rules, and the other approach that
uses clustering and probabilities, Model 2 (Abonyi,
2003), has three rules. The proposed approach is
much more compact (in terms of the number of
rules) than the grid approach and more accurate.
Although the proposed approach requires more
memory (training data must be available in
FUZZY CLASSIFIER BASED ON SUPERVISED CLUSTERING WITH NONPARAMETRIC ESTIMATION OF
LOCAL PROBABILISTIC DENSITIES IN DEFAULT PREDICTION OF SMALL ENTERPRISES
511