
 
that the rith cluster describes the density of the class 
of the kth data, 
=
. The proposed 
approach estimate the second term by non-
parametric estimation of the probability densities of 
each class in each cluster (r
i
), described in the next 
section. 
3 PROBABILITY ESTIMATION 
The estimation of local probability densities for each 
class in each cluster is based on the original 
Probabilistic Neural Network (PNN).  PNN (Bishop, 
1995) is a network formulation of probability 
density estimation. A PNN consists of several sub-
networks, each of which is a Parzen window PDF 
estimator for each of the classes. The input nodes are 
the set of measurements. The second layer consists 
of the Gaussian functions formed using the given set 
of training data points as centres. The third layer 
performs an average operation of the outputs from 
the second layer for each class. The fourth layer 
performs a vote, selecting the largest value. The 
associated class label is then determined. The PNN 
is a classifier version, which combines the Baye’s 
strategy for decision-making with a non-parametric 
estimator for obtaining the probability density 
function (PDF).  
4 DEFAULT PREDICTION 
The sample data set comes from a state-owned 
commercial bank. The original dataset contains 126 
instances however 3 of these are omitted because 
these are incomplete data, which is common with 
other studies. The class distribution is 51% default 
and 42% non-default. The 123 samples represent 
Small and Medium Enterprises of only one state of 
Brazil. Among these enterprises, the number of the 
enterprises which could repay the loan is 60, the rest 
63 are those which could not repay the loan.  
Our model is an accounting based model. In this 
kind of model, accounting balance sheets are used 
and the input indexes include the enterprise’s 
capability of returning loan and wish of returning 
loan, and in this work the capability was analysed.  
The capability of returning loan is measured by 
several indexes that reflect the financial situation of 
enterprise, such as profitable capability, operating 
efficiency, repayment capability and situation of 
enterprise’s cash flow, etc. Four accounting financial 
ratios were chosen (these are the most common used  
indexes). These are as follows: 
X1 = Earnings before taxes / Average total assets 
X2 = Total liabilities / Ownership interest 
X3 = Operational cash flow / Total liabilities 
X4 = Working capital / Total assets.  
Each index represented the average of three 
periods before the prediction period.  
We limited the number of clusters in 5, in order 
to maintain good interpretability. The best results are 
obtained using three clusters. Therefore, our rule 
base has three rules. 
Some membership functions, related to variable 
X1, obtained by fuzzy clustering are illustrated in 
Figure 1. Clusters have different covariance 
matrices, and they are diagonal matrices, in order to 
project memberships on original variables. The 
algorithm did not optimize clustering based on 
interclass separability.   
The estimated densities projected on the original 
input variable X1 are illustrated by Figure 2. This 
figure shows the densities related to variable X1 for 
the two classes related to one cluster. 
 
Figure 1: Membership functions related to variable X1 in 
the three clusters (rules). 
The performance of the obtained classifiers was 
measured by leave-one-out cross validation As the 
name suggests, leave-one-out cross-validation 
(LOOCV) involves using a single observation from 
the original sample as the validation data, and the 
remaining observations as the training data. This is 
repeated such that each observation in the sample is 
used once as the validation data.  
The results are summarized in Table 1. We 
report the mean values of the error. For the proposed 
approach, Model 1, the number of rules is equal to 
the number of clusters. The grid partition approach, 
Model 3, has 20 rules, and the other approach that 
uses clustering and probabilities, Model 2 (Abonyi, 
2003), has three rules. The proposed approach is 
much more compact (in terms of the number of 
rules) than the grid approach and more accurate. 
Although the proposed approach requires more 
memory (training data must be available in
FUZZY CLASSIFIER BASED ON SUPERVISED CLUSTERING WITH NONPARAMETRIC ESTIMATION OF
LOCAL PROBABILISTIC DENSITIES IN DEFAULT PREDICTION OF SMALL ENTERPRISES
511