ning. The received set of rules was then applied to
the testing set with the aid of ’Batch Classifier’ tool.
The best result, which we gained (with the set con-
taining 10 rules, was: 63.42% cases classified cor-
rectly, 6.34% incorrectly and 30.24% unclassified.
Logistic regression model developed on the train-
ing set and applied to test set gained 86.47% of cor-
rectness. This result coincide with other investiga-
tions which report that logistic regression belongs to
the most efficient methods in the field (West, 2000;
Xiao et al., 2006).
Let denote as p
12
the percentage of actually de-
nied applications misclassified into the granted group
and as p
21
the percentage of actually granted appli-
cations misclassified into denied group. It can be ob-
served that in all three models p
12
is much greater
then p
21
. The Rosetta’s total misclassified 6.34%
cases was divided as follows: p
12
= 5.02% and p
21
=
1.32%. For the affinity set model: p
12
= 11.05% and
p
21
= 3.00%. Similarly, the logistic regression shows
p
12
= 13.16% and p
21
= 0.37%.
In credit scoring applications, it is generally be-
lieved that the costs of granting credit to a bad candi-
date is significantly greater than the cost of denying
credit to a good candidate. As Rosetta model has the
lowest p
12
, it’s results might get better score if the
pure classification rate has been substituted by a kind
of cost analysis. On the other hand, logistic regres-
sion has the highest value of p
12
which can lower its
score. The problem is – that in contrast to full clas-
sification reached by affinity and logistic regression –
Rosetta left the great amount (30.24%) of cases un-
classified. This fact makes difficult exact calculations
and drawing well-founded conclusions.
4 CONCLUSIONS
Our results shows that the affinity measure defined in
eq.(1) and followed by extracting procedure described
in Sec. 2.2 is able to gain the promising results in
data mining. The method is rather simple in com-
parison to other approaches. It is also low demand-
ing (no preliminary assumptions and low demand for
computing resources). We received much higher clas-
sification rate then rough sets and genetic algorithm
implemented in Rosetta software. The performance
of our concept was very close to the level gained by
logistic regression model which was reported as one
of the best in the field of credit scoring.
To confirm the promising efficiency of the pro-
posed method further studies should be carried out.
First of all, there is a need for:
• comparison with the neural network models and
other highly efficient modern methods of data
mining,
• checking the results with other credit databases.
In this paper the outcomes predicted by the model
were confronted with actual credit decisions. It would
be also interesting to check the model predictions
with actual credit performance.
Our model was tested on credit application
database but can be applied equally well to medi-
cal, marketing, managerial and other databases. The
model offers meaningful adaptability and several ex-
periments with various technical modifications can be
made.
REFERENCES
Chen, Y. and Larbani, M. (2006). Developing the affinity
set and its applications. In Proceeding of the Distin-
guished Scholar Workshop by National Science Coun-
cil, Jul. 14-18, 2006, Taiwan. National Science Coun-
cil, Taiwan.
Larbani, M. and Chen, Y. (2006). Affinity set and its ap-
plications. In Proceeding of the International Work-
shop on Multiple Criteria Decision Making, Apr. 14-
18, 2007, Poland. Publisher of The Karol Adamiecki
University of Economics in Katowice.
Lee, T.-S. and Chen, I.-F. (2005). A two-stage hybrid
credit scoring model using artificial neural networks
and multivariate adaptive regression splines. Expert
Systems with Applications, 28:743752.
Lee, T.-S., Chiu, C.-C., Chou, Y.-C., and Lu, C.-J. (2006).
Mining the customer credit using classification and
regression tree and multivariate adaptive regression
splines. Computational Statistics & Data Analysis,
50:11131130.
Lee, T.-S., Chiu, C.-C., Lu, C.-J., and Chen, I.-F. (2002).
Credit scoring using the hybrid neural discrimi-
nant technique. Expert Systems with Applications,
23:245254.
Øhrn, A., Komorowski, J., Skowron, A., and Synak, P.
(1994). The design and implementation of a knowl-
edge discovery toolkit based on rough sets: The
rosetta system. In Polkowski, L. and Skowron, A., ed-
itors, Rough Sets in Knowledge Discovery 1: Method-
ology and Applications, volume 18 of Studies in
Fuzziness and Soft Computing, chapter 19, page 376.
Physica-Verlag, Heidelberg, Germany.
West, D. (2000). Neural network credit scoring models.
Computers & Operations Research, 27:1131–1152.
Xiao, W., Zhao, X., and Fei, Q. (2006). A compara-
tive study of data mining methods in consumer loans
credit scoring management. J. Syst. Sci. Syst. Eng.,
15(4):419–435.
ICEIS 2008 - International Conference on Enterprise Information Systems
290