attributes for each period.
3 EXPERIMENT RESULTS
3.1 Data used in the Experiment
The dataset that was applied for the experiment
consists of entries from 785 USA Transportation,
Communications, Electric, Gas, And Sanitary
Services companies with their 1999-2008 yearly
financial records (balance and income statement)
from financial EDGAR database.
Each instance has 51 financial attributes (indices
used in financial analysis). “Risky” and “Non-
Risky” classes were formed using Zmijewski’s
scoring technique widely used in banking.
Table 1: Main characteristics of datasets used in
experiments.
Year
Entries labeled as
Total entries
No of selected
attributes
Bankrupt 1
years after
Bankrupt >1
year after
Risky (R)
Not risky
(NR)
1999 376 166 542 11 - -
2000 423 192 615 8 0 0
2001 383 226 609 13 2 1
2002 376 239 615 11 1 0
2003 417 220 637 9 0 0
2004 460 194 654 9 1 1
2005 478 173 651 8 1 4
2006 375 118 493 8 0 1
2007 367 112 479 11 0 6
2008 38 12 50 8 - -
Total 3693 1652 5345 5 13
Note that ratios in original Zmijewski were not used
in order to avoid linear dependence between
variables. Main characteristics of the datasets
formed for the experiment are presented in Table 1.
It also shows financial ratios which were considered
relevant by feature selection procedure; the number
of such features is larger than the ones which are
considered in original evaluator.
3.2 Computational Results
Correlation-based feature subset selection (Hall,
2001) algorithm with Tabu search for search in
attribute subsets was applied for feature selection.
The search space for PSO was set
to
]50;0[
C
,
]1;0[
bias
, as well as the number of
run iterations was set to 10. PSO was configured to
run with 20 particles and inertia rate of 0.8. Velocity
for p
2
was set to 3, for p
3
was set to 0.2.
Table 2 presents the results obtained by PSO-
LinSVM classifier: classifier parameters, obtained
by PSO, classification accuracy together with True
Positive and F-Measure rates for each class. It is
clear that classification accuracy did not show stable
increase while providing the classifier with more
data each year. While performing testing procedure
with first year data, accuracy decreased to 80% in
2004 although next year it returned to 83.8% was
relatively stable, and later in fell to 82%; similar
trends might be identified while analyzing testing
results obtained with Year 2 and Year 3 data. It is
important to note that instances marked as “risky”
were identified better.
Table 2: Experimental classification results.
Training period 2000 2001 2002 2003 2004 2005 2006 2007
Linear classifier
L1-SVM
(dual)
L2-SVM
(dual)
L2-SVM
(dual)
L2-RLR
L2-SVM
(primal)
L2-SVM
(dual)
L2-SVM
(dual)
L2-
SVM
(primal)
C 15,3157 47,8343 24,7346 29,0490 22,3727 38,0860 6,5322 48,0734
Bias 1,000 0,196 0,749 0,797 0,873 0,838 0,436 0,508
Accuracy 77,941 78,409 80,220 83,689 80,640 83,806 82,887 82,000
Year 1
TP
R 0,969 0,952 0,981 0,987 0,952 0,957 0,970 0,974
NR 0,461 0,521 0,464 0,482 0,412 0,462 0,385 0,333
F-Measure
R 0,846 0,843 0,867 0,895 0,878 0,900 0,896 0,892
NR 0,609 0,653 0,618 0,637 0,535 0,579 0,520 0,471
Year 2
Accuracy 80,032 77,080 84,146 83,232 83,806 84,742 82,000 -
TP
R 0,979 0,947 0,985 0,990 0,957 0,959 0,974 -
NR 0,521 0,436 0,503 0,407 0,462 0,496 0,333 -
F-Measure
R 0,857 0,844 0,897 0,896 0,900 0,905 0,892 -
NR 0,670 0,568 0,653 0,567 0,579 0,611 0,471 -
Year 3
Accuracy 77,237 80,488 83,384 86,032 84,124 84,000 - -
TP
R 0,966 0,952 0,987 0,987 0,967 0,974 - -
NR 0,405 0,456 0,418 0,462 0,444 0,417 - -
F-Measure
R 0,848 0,873 0,897 0,915 0,902 0,902 - -
NR 0,551 0,582 0,576 0,615 0,575 0,556 - -
verage testing accuracy 78,403 78,660 82,583 84,318 82,857 84,183 82,444 82
ICEIS2012-14thInternationalConferenceonEnterpriseInformationSystems
340