BANKRUPTCY PREDICTION BASED ON INDEPENDENT

COMPONENT ANALYSIS

Ning Chen

GECAD, Instituto Superior de Engenharia do Porto, Instituto Politecnico do Porto, Portugal

Armando Vieira

Physics Department, Instituto Superior de Engenharia do Porto, Instituto Politecnico do Porto, Portugal

Keywords:

Bankruptcy prediction, Learning vector quantization, Independent component analysis.

Abstract:

Bankruptcy prediction is of great importance in ﬁnancial statement analysis to minimize the risk of decision

strategies. It attempts to separate distress companies from healthy ones according to some ﬁnancial indicators.

Since the real data usually contains irrelevant, redundant and correlated variables, it is necessary to reduce the

dimensionality before performing the prediction. In this paper, a hybrid bankruptcy prediction algorithm is

proposed based on independent component analysis and learning vector quantization. Experiments show the

algorithm is effective for high dimensional bankruptcy data and therefore improve the capability of prediction.

1 INTRODUCTION

Bankruptcy prediction is an important issue for

human decision-making in many ﬁnancial do-

mains (P. Ravi Kumara, 2007). It can be regarded

as a classiﬁcation task which attempts to separate

distress companies from healthy ones in terms of

some ﬁnancial and accounting indicators, such as

proﬁtability, solidity and liquidity. Compared to

traditional statistical methods, e.g., linear discrim-

inant analysis (LDA) and multivariate discriminant

analysis (MDA), artiﬁcial neuron networks (ANNs)

achieve desirable performance and therefore receive

more and more attentions to solve bankruptcy pre-

diction tasks. The appealing aspect of ANN is char-

acterized by nonlinear modelisation, computational

simplicity and noise insensitivity. It was reported

that among intelligent techniques ANN is the most

widely used family in supervised or unsupervised

manner (P. Ravi Kumara, 2007), e.g., self-organizing

map (SOM) (E. Merkevicius, 2004), learning vec-

tor quantization (LVQ) (K. Kiviluoto, 1997), multi

layer perceptron (MLP) (Armando Vieira, 2003). The

ability of LVQ for bankruptcy prediction has been

demonstrated (K. Kiviluoto, 1997). LVQ is also used

to correct the output of MLP in a hidden layer learn-

ing vector quantization algorithm in order to improve

the quality of prediction (J.C. Neves, 2006).

Due to the complexity of ﬁnancial statements, a

few indicators are insufﬁcient for bankruptcy predic-

tion, while a large number of indicators lead to the

curse of dimensionality, i.e., the amount of training

data needed increases exponentially with the number

of variables in order to cover the decision space. The

real data usually contains irrelevant, redundant and

correlated variables, not only decreasing the precision

of classiﬁcation but also consuming a mass of com-

putational time and space. Feature reduction picks

a subset of relevant features to target (selection) or

generates new features from the basic ones (construc-

tion) in order to achieve a more concise and accurate

model (M. Dash, 1997) . It is particularly important

when the number of training data is limited. In the

literature, a number of feature selection methods are

available. In (Tsai, 2008), ﬁve well-known statistical

methods, i.e., t-test, correlation matrix, factor anal-

ysis, principle component analysis and stepwise re-

gression are compared based on a MLP classiﬁer for

bankruptcy prediction. Independent component anal-

ysis (ICA) (A. Hyvarinen, 2001) originating from sig-

nal processing is one of the promising methods for

dimensionality reduction and has been used with suc-

cess in many applications (E. Oja, 2000; Bingham,

2001), e.g., ﬁnancial time series analysis, text mining,

150

Chen N. and Vieira A. (2009).

BANKRUPTCY PREDICTION BASED ON INDEPENDENT COMPONENT ANALYSIS.

In Proceedings of the International Conference on Agents and Artiﬁcial Intelligence, pages 150-155

DOI: 10.5220/0001536301500155

 SciTePress

Figure 1: Algorithm description.

astronomical telescope image processing and wireless

communication.

In this paper, a batch learning vector quantiza-

tion algorithm is used as a classiﬁcation approach for

bankruptcy prediction according to a number of ﬁ-

nancial indicators. The LVQ algorithm starts from a

trained SOM and learns in batch manner. In data pre-

processing phase, the independent component analy-

sis is applied as a feature construction tool to trans-

form data in high dimensional input space to low di-

mensional ICA space composed of independent com-

ponents. The proposed approach is tested on a French

ﬁnancial database by cross validation principle. Ex-

perimental results on both balanced and unbalanced

data show that the hybrid algorithm is superior or

comparable to plain LVQ and some well-known clas-

siﬁcation algorithms.

The paper is organized as follows. In the next

section, the methodology of LVQ and ICA for

bankruptcy prediction is illustrated. Then the exper-

iments and results are presented. Lastly, the conclu-

sions and future issues are given.

2 METHODOLOGY

The proposed algorithm is motivated by the capability

of LVQ as a classiﬁcation tool and the ability of ICA

as a dimensionality reduction tool. The algorithm is

processed in two phases as described in Figure 1. In

the ﬁrst phase, the input vector in original data space

is fed into an ICA preprocessing module given the

number of dimensions remained. The transformed

data in ICA space has less dimensions than original

data while preserving the intrinsic distribution. In the

second phase, the data is input in batch to the LVQ

map for classiﬁer training in a supervised manner.

2.1 ICA

Independent component analysis is a signal process-

ing method to separate a multivariate signal into

independent components under the assumption that

the components are statically independent in non-

Gaussian distribution. Formally, an observed paral-

lel signal x

(i = 1,...,n) is a linear mixture of inde-

pendent source signals or factors s

( j = 1,...,m) with

mixing coefﬁcients a

i, j

∑

j=1

i, j

In vector format,

x = As

where x = [x

,...,x

]

′

, A = (a

i, j

), and s = [s

,..,s

]

′

The basic idea of ICA is to estimate unknown A

and s given the observations x. Alternatively, the in-

dependent components can be calculated:

s = Wx

where W is the inverse matrix of A. The transformed

data in ICA space is of less dimensions and there-

fore easily predicted. In this paper, a fast ICA algo-

rithm (E. Oja, 2000) is used for efﬁcient ICA estima-

tion in a ﬁxed-point iteration scheme.

2.2 LVQ

LVQ is an artiﬁcial neural network for supervised

learning usage. It consists of two layers of neurons:

the neurons in input layer receive data from variables

and the neurons in output layer arranged in a regu-

lar grid of one or two dimensions are associated with

input neurons by weight vectors (reference vectors,

or prototypes). The prototypes deﬁne the represen-

tative vector of corresponding class regions. During

the training, the prototypes are updated according to

the projection between input data and neurons in or-

der to adjust the class boundary. To get a reasonable

initialization, LVQ usually starts from a trained map

of SOM (Kohonen, 1997). In this paper, a batch LVQ

algorithm is applied for data classiﬁcation.

Firstly, a SOM is initialized with a number of neu-

rons and trained in an unsupervised way. Then the

neurons are assigned by the majority label principle.

In one batch round, an instance x is selected as in-

put once a time and the Euclidean distance is calcu-

lated between x and each neuron m

, then the one of

smallest distance is selected as the best matching unit

(BMU). As a result, x is projected to its BMU c.

c = argmin

d(x,m

)

After all input are processed, the data is divided

into a number of Voronoi sets: V

= {x

| d(x

) ≤

d(x

),1 ≤ k ≤ n,1 ≤ j ≤ m}. Each Voronoi set is

composed of the instances whose BMU is the corre-

sponding neuron. The prototype is then updated sum-

ming up the impact of all elements in Voronoi sets

BANKRUPTCY PREDICTION BASED ON INDEPENDENT COMPONENT ANALYSIS

151

with the consideration of class matching. Let h

the indicative function whose value is 1 if m

is the

winner neuron of x

and with the same class label to

, and -1 if m

is the winner neuron of x

and with

different class label to x

, and 0 otherwise. The update

rules are described as:

(t + 1) =

∑

i=1

∑

i=1

where







1 if m

= BMU(x

),class(m

) = class(x

)

−1 if m

= BMU(x

),class(m

) 6= class(x

)

0 otherwise

If the denominator is 0 for some m

, no updating

is done. This process is repeated for sufﬁcient itera-

tions until the prototypes are regarded as steady. The

learned map is used for future classiﬁcation in which

the data is compared with the neuron and assigned by

the class of its BMU.

3 EXPERIMENTAL METHOD

The data used in the experiments comes from the Di-

anan database of French companies, containing the

ﬁnancial statements during 1998 and 2000. Most

companies are of small or middle size with at least

35 employees. In the total 2056 companies, 583

are labeled by ‘distress’ (declare as bankruptcy or

submit a restructuring plan) and 1473 are labeled as

‘healthy’. Utilizing the inconsistency analysis, 17 in-

dicators are remained with strong correlation to the

class, added by three annual variations of important

ratios (J.C. Neves, 2006). Described in Table 1, the

ﬁnal data consists of 20 ﬁnancial ratios followed by

the class label. To study the impact of data balance to

performance, the training data is randomly selected

from the original dataset, with different proportions

of healthy companies compared to bankruptcy com-

panies as 50/50 (DS1), 64/38 (DS2) and 72/28 (DS3)

respectively.

In bankruptcy prediction domain, the traditional

accuracy is insufﬁcient for evaluation due to the dif-

ferent cost of misclassiﬁcation. Therefore, seven

measures are used for classiﬁcation evaluation. Type I

error (missing alarm) is the percentage of misclassiﬁ-

cation classifying a bankruptcy company as a healthy

one, while type II error (false alarm) is the percent-

age of misclassiﬁcation classifying a healthy com-

pany as a bankruptcy one. Overall error is the per-

centage of companies classiﬁed incorrectly. Besides,

Table 1: Financial indicators of French companies.

variables description

1 number of employee

2 ﬁnancial equilibrium ratio

3 equity to stable funds

4 ﬁnancial autonomy

5 current ratio

6 collection period

7 interest to sales

8 debt ratio

9 ﬁnancial debt to cash earnings

10 working capital in sales days

11 value added per employee

12 value added to assets

13 EBITDA margin

14 margin before extra items and taxes

15 return on equity

16 value added margin

17 percentage of value added for employees

18 annual variation of debt ratio

19 annual variation of percentage

of value added for employee

20 annual variation of margin before

extra items and taxes

21 class: 0 for ‘healthy’, 1 for ‘bankruptcy’

the OC (overall accuracy) is the percentage of com-

panies classiﬁed correctly, BC (bankruptcy classiﬁca-

tion) is the percentage of bankruptcy classiﬁed cor-

rectly, BPC (bankruptcy prediction classiﬁcation) is

the percentage of bankruptcy companies in the to-

tal predicted bankruptcies, WE (weighted efﬁciency)

is the combination of three accuracy measures. The

error and accuracy measures are deﬁned in Table 2,

where n

denotes the number of companies belong-

ing to group i (0 for healthy and 1 for bankruptcy)

in real classes and group j in predicted classes. The

cross validation technique is employed for perfor-

mance evaluation in terms of error measures and ac-

curacy measures.

Table 2: Evaluation measures.

measures deﬁnition

Type I error n

/(n

+ n

)

Type II Error n

/(n

+ n

)

Overall Error (n

+ n

)/(n

+ n

)

OC (n

+ n

)/(n

+ n

)

BC n

/(n

+ n

)

BPC n

/(n

+ n

)

√

OC ∗BC∗BPC

The LVQ algorithm is implemented in Matlab and

performed with FastICA package (Erkki Oja, 2005).

In summary, the experiments are performed in the fol-

lowing steps:

1. Apply ICA to original data given the number of

dimensions remained;

ICAART 2009 - International Conference on Agents and Artificial Intelligence

152

2. Randomly select a number of healthy companies

from the preprocessed data with the percent ratio

and generate the experimental datasets;

3. Divide the data into 10 folds randomly for cross-

validation: 90% is used for model training, and

the other 10% is used for performance validation;

4. Train LVQ on each training data and then classify

the test data according to the resulting map;

5. Calculate the error and accuracy measures for

classiﬁcation evaluation;

6. Obtain the average value of measures for distinct

results in cross validation.

4 RESULTS AND DISCUSSION

From the curve shown in Figure 2, the eigenvalues

remained increase monotonously with respect to the

number of dimensions and achieve 99.9% at 15 di-

mensions. Figure 3 shows the results at various num-

ber of dimensions remained for classiﬁer construc-

tion on DS1. The default map size of LVQ is de-

termined heuristically by the number of training data

with the side lengths of map grid as the ratio of two

biggest eigenvalues. The error measures decrease dra-

matically at the beginning and amount to the minimal

(e.g., overall error is 0.156 on DS1) nearby 14 dimen-

sions. Increasing further the number of dimensions

does not have any improvement. Meantime, the ac-

curacy measures demonstrate the opposite tendency,

i.e., increase ﬁrst and amount to maximum at 14 di-

mensions (e.g., WE is 0.77 correspondingly). The

tendency could also be detected from the plot of DS2

(Figure 4) and DS3 (Figure 5). The number of di-

mensions corresponding to the best WE is 14 (DS1),

17 (DS2) and 15 (DS3) respectively. In the following

experiments, we choose the number of dimensions as

the above values.

0 5 10 15 20

100

dimension

eigenvalues retained

1:37.96%

5:86.1%

15:99.9%

10:96.8%

Figure 2: Eigenvalues and dimension of ICA.

In Table 3-Table 5, the results of measures for bal-

anced (DS1) and unbalanced (DS2 and DS3) data are

0 5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

dimensions

measures

Err1 Err2 Err OC BC BPC WE

Figure 3: Results of various dimensions on DS1.

0 5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

dimensions

measures

Err1 Err2 Err OC BC BPC WE

Figure 4: Results of various dimensions on DS2.

0 5 10 15 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

dimensions

measures

Err1 Err2 Err OC BC BPC WE

Figure 5: Results of various dimensions on DS3.

given. For all datasets, the proposed Hybrid LVQ

with ICA (approach1) achieves noticeably lower er-

rors (e.g., at least 2% for overall error) and higher ac-

curacy (e.g., at least 4% for weighted efﬁciency) than

plain LVQ without data preprocessing (approach2).

The former outperforms the latter signiﬁcantly on

classifying unknown data, indicating ICA is bene-

ﬁcial for dimensionality reduction and consequently

improves the generalization error when the training

data is insufﬁcient. Additionally, utilizing the unbal-

anced data is able to improve type II error, but elim-

inate simultaneously type I error which usually cost

more in real situations. It is also shown that LVQ

with ICA is less biased to unbalanced data than the

plain LVQ.

The map size is one of the most important param-

BANKRUPTCY PREDICTION BASED ON INDEPENDENT COMPONENT ANALYSIS

153

Table 3: Results on DS1.

mea- arrroach1 approach2

sures train test train test

Err1 0.132 0.201 0.172 0.227

Err2 0.059 0.109 0.081 0.142

Err 0.096 0.156 0.126 0.185

OC 0.904 0.844 0.874 0.815

BC 0.868 0.799 0.828 0.773

BPC 0.937 0.879 0.911 0.847

WE 0.86 0.77 0.812 0.73

Table 4: Results on DS2.

mea- arrroach1 arrroach2

sures train test train test

Err1 0.184 0.247 0.22 0.314

Err2 0.038 0.069 0.04 0.066

Err 0.091 0.134 0.105 0.154

OC 0.909 0.866 0.895 0.846

BC 0.816 0.753 0.78 0.686

BPC 0.924 0.860 0.917 0.855

WE 0.83 0.749 0.801 0.704

eters inﬂuencing the classiﬁer performance. The re-

sults of different map sizes on DS1 is given in Ta-

ble 6. As the map enlarges from 4 x 3 to 14 x 11, it

performs better on both training and generalization.

When more neurons (27 x 23) are used, the train-

ing error improves slightly while the generalization

error degrades signiﬁcantly. The results on DS2 and

DS3 are shown in Table 7 and Table 8 respectively.

It can be concluded that the middle size of map grid

is suitable for model construction empirically, while

less units are inadequate for pattern presentation, and

more units leads to overﬁtting.

Table 9 shows the results obtained on balanced

and unbalanced data sets for Hybrid LVQ and some

well-known classiﬁcation methods: ZeroR (a baseline

algorithm by simply predicting the majority class in

training data), VFI (voting feature interval classiﬁer),

SMO (sequential minimal optimization algorithm im-

plementing support vector machine), k-nearest neigh-

bors (KNN, the best value of k chosen between 1 and

10), Naive Bayes, and C4.5 decision tree. Among

the seven algorithms, VFI and SMO have poor per-

formance just better than ZeroR in all data sets, fol-

lowed by KNN and Naive Bayes. Superior to the ﬁve

algorithms, hybrid LVQ performs well, close to C4.5

in terms of error and accuracy measures. However,

LVQ is a projection method as well as a classiﬁcation

approach. An appeal of LVQ is the ability to detect

class structure from map visualization which makes it

a useful tool in data mining tasks. Figure 6 presents a

generated map of LVQ, the labels on the left and the

histogram of class distribution on the right. From the

Table 5: Results on DS3.

mea- arrroach1 arrroach2

sures train test train test

Err1 0.209 0.297 0.275 0.417

Err2 0.024 0.033 0.028 0.432

Err 0.076 0.108 0.097 0.152

OC 0.924 0.892 0.903 0.848

BC 0.791 0.703 0.725 0.583

BPC 0.929 0.894 0.909 0.845

WE 0.82 0.748 0.771 0.647

Table 6: Results of different map sizes on DS1.

mea- small:4x3 middle:14x11 big:27x23

sures train test train test train test

Err1 0.3 0.29 0.13 0.2 0.07 0.23

Err2 0.06 0.08 0.06 0.11 0.07 0.18

Err 0.18 0.19 0.1 0.16 0.07 0.2

OC 0.82 0.81 0.9 0.84 0.93 0.78

BC 0.7 0.71 0.87 0.8 0.93 0.77

BPC 0.92 0.91 0.94 0.88 0.93 0.82

WE 0.73 0.73 0.86 0.77 0.9 0.71

Table 7: Results of different map sizes on DS2.

mea- small:8x6 middle:15x13 big:30x26

sures train test train test train test

Err1 0.33 0.37 0.18 0.25 0.1 0.27

Err2 0.03 0.04 0.04 0.07 0.04 0.11

Err 0.14 0.16 0.09 0.13 0.06 0.17

OC 0.87 0.84 0.91 0.87 0.94 0.83

BC 0.67 0.63 0.82 0.75 0.9 0.73

BPC 0.94 0.92 0.92 0.86 0.93 0.78

WE 0.74 0.7 0.83 0.75 0.89 0.69

Table 8: Results of different map sizes on DS3.

mea- small:8x7 middle:16x13 big:31x27

sures train test train test train test

Err1 0.36 0.4 0.21 0.3 0.11 0.31

Err2 0.02 0.03 0.02 0.03 0.02 0.06

Err 0.12 0.14 0.08 0.11 0.05 0.14

OC 0.88 0.86 0.92 0.89 0.95 0.87

BC 0.64 0.6 0.79 0.7 0.89 0.69

BPC 0.92 0.87 0.93 0.89 0.93 0.81

WE 0.72 0.67 0.82 0.75 0.89 0.69

visualization, the healthy companies projected to neu-

rons in the middle of map grid and bankruptcy com-

panies projected to the surrounding neurons.

5 CONCLUSIONS

In this paper, a hybrid LVQ algorithm is presented

to solve the bankruptcy prediction problem. In or-

der to reduce the curse of dimensionality, ICA is used

as a preprocessing tool to eliminate the dimensions

ICAART 2009 - International Conference on Agents and Artificial Intelligence

154

Figure 6: A sample generated LVQ.

Table 9: Results of balanced and unbalanced data sets.

bankruptcy Type I Type II Overall WE

/healthy Error Error Error

50/50

ZeroR 0.604 0.403 0.503 0.312

VFI 0.474 0.03 0.252 0.61

SMO 0.267 0.194 0.231 0.667

KNN 0.271 0.071 0.171 0.742

Naive Bayes 0.304 0.050 0.177 0.731

Hybrid LVQ 0.201 0.109 0.156 0.77

C4.5 0.156 0.130 0.143 0.791

36/64

ZeroR 1 0 0.36 -

VFI 0.667 0.019 0.252 0.476

SMO 0.470 0.047 0.199 0.605

KNN 0.363 0.039 0.156 0.697

Naive Bayes 0.342 0.047 0.153 0.703

Hybrid LVQ 0.247 0.069 0.134 0.749

C4.5 0.172 0.083 0.115 0.789

28/72

ZeroR 1 0 0.28 -

VFI 0.734 0.012 0.214 0.432

SMO 0.566 0.018 0.171 0.571

KNN 0.398 0.028 0.131 0.684

Naive Bayes 0.330 0.047 0.126 0.704

Hybrid LVQ 0.297 0.033 0.108 0.748

C4.5 0.179 0.077 0.104 0.765

in a transformed independent component space. Re-

sults on French companies data demonstrate the pro-

posed algorithm is of higher stability and generaliza-

tion power than plain LVQ without ICA. Regarding

the comparison of seven classiﬁcation methods, the

hybrid LVQ performs well, only slightly inferior to

C4.5. Since ICA is used as a feature reduction tool in

data preprocessing phase, it could be combined with

other classiﬁcation tools.

In future work, some strategies, e.g., Neyman-

Pearson criterion are expected to improve the per-

formance of the presented algorithm in cost sensi-

tive situation. The non-uniqueness of different ICA

algorithms will be considered in the evaluation. In

addition, there exist various linear or nonlinear fea-

ture reduction methods, so further investigation is

needed to compare and evaluate their performance for

bankruptcy prediction.

ACKNOWLEDGEMENTS

The authors would like to acknowledge the ﬁnancial

grant of GECAD/ISEP-Knowledge Based, Cognitive

and Learning Systems (C2007-FCT/442/2006).

REFERENCES

A. Hyvarinen, J. Karhunen, E. O. (2001). Independent com-

ponent analysis. John Wiley & Sons.

Armando Vieira, N. B. (2003). A training algorithm for

classiﬁcation of high dimensional data. Neurocom-

puting, 50(1):461–472.

Bingham, E. (2001). Topic identiﬁcation in dynamical text

by extracting minimum complexity time components.

In Proc. 3rd International Conference on Indepen-

dent Component Analysis and Blind Signal Separa-

tion, pages 546–551.

E. Merkevicius, G. Garsva, R. S. (2004). Forecasting of

credit classes with the self-organizing maps. Informa-

tion Technology And Control, Kaunas, Technologija,

4(33):61–66.

E. Oja, K. Kiviluoto, S. M. (2000). Independent compo-

nent analysis for ﬁnancial time series. In Proc. IEEE

2000 Symp. on Adapt. Systems for Signal Proc.Comm.

and Control AS-SPCC, pages 111–116, Lake Louise,

Canada.

Erkki Oja, M. H. e. (2005). The fastica package for matlab.

http://www.cis.hut.ﬁ/projects/ica/fastica/.

J.C. Neves, A. V. (2006). Improving bankruptcy prediction

with hidden layer learning vector quantization. Euro-

pean Accounting Review, 15(2):253–271.

K. Kiviluoto, P. B. (1997). Analyzing ﬁnancial statements

with the self-organizing map. In Proc. Workshop Self-

Organizing Maps, pages 362–367.

Kohonen, T. (1997). Self-organizing maps. Springer Verlag,

Berlin, 2nd edition.

M. Dash, H. L. (1997). Feature selection for classiﬁcation.

Intelligent Data Analysis, 1:131–156.

P. Ravi Kumara, V. R. (2007). Bankruptcy prediction

in banks and ﬁrms via statistical and intelligent

techniques-a review. European Journal of Opera-

tional Research, 180(1):1–28.

Tsai, C.-F. (2008). Feature selection in bankruptcy predic-

tion. Knowledge-Based Systems.

BANKRUPTCY PREDICTION BASED ON INDEPENDENT COMPONENT ANALYSIS

155