IMPROVING ICA ALGORITHMS APPLIED TO PREDICTING

STOCK RETURNS

J. M. G

orriz

University of C

adiz

Avda Ram

on Puyol E 11202 s/n Algeciras

C. G. Puntonet

University of Granada

Daniel Saucedo E 18071 s/n

R. Mart

ın-Clemente

University of Seville

Avda. Descubrimientos s/n

Keywords:

Independent Component Analysis, On-line Support Vector Machines, Regularization Theory, Preprocessing

Techniques.

Abstract:

In this paper we improve a well known signal processing technique such as independent component analysis

(ICA) or blind source separation applied to predicting multivariate ﬁnancial such as portfolio of stock returns

using the Vapnik-Chervonenkis theory. The key idea in ICA algorithms is to linearly map the input space

series (stock returns) into a new space which contains statistically independent components. There´s a wide

class of ICA algorithms however they usually fail due to their high convergence rates or their limited ability

of local search, as the number of observed signals increases.

1 INTRODUCTION

ICA algorithms have been applied successfully to sev-

eral ﬁelds such as biomedicine, speech, sonar and

radar, signal processing, etc. and more recently also

to time series forecasting (G

orriz et al., 2003), i.e. us-

ing stock data (Back and Weigend, 1997), and they

have faced the problem of blind source separation

(BSS) (Bell and Sejnowski, 1995). In the latter ap-

plication the mixing process of multiple sensors is

based on linear transformation making the following

assumptions:

1. the original (unobservable) sources are statistically

independent which are related to social-economic

events.

2. the number of sensors (stock series) is equal to that

of sources.

3. the Darmois-Skitovick conditions are satisﬁed

(Cao and Liu, 1996).

In this work we apply Genetic Algorithms to ICA in

the search of the separation matrix, in order to im-

prove the performance of endogenous learning sup-

port vector machines (SVMs) in real time series fore-

casting speeding up convergence rates (scenarios with

the BSS problem in higher dimension). Thus we are

using ICA as preprocessing technique in order to ex-

tract interesting structure in the stock.

SVMs are learning algorithms based on the struc-

tural risk minimization principle (Vapnik, 1999)

(SRM) characterized by the use of the expansion of

SV “admissible” kernels and the sparsity of the solu-

tion. They have been proposed as a technique in time

series forecasting (Muller et al., 1999) and have faced

the overﬁtting problem, presented in classical neural

networks, thanks to their high capacity for general-

ization. The solution for SVM prediction is achieved

solving the constrained quadratic programming prob-

lem thus SV machines are nonparametric techniques,

i.e. the number of basis functions are unknown before

hand. The solution of this complex problem in real-

time applications can be extremely uncomfortable be-

cause of high computational time demand. Moreover

the low rates of ICA algorithms can cause a design

fault in real time applications.

We organize the essay as follows. SV-ICA algo-

rithm for ﬁnancial time series prediction will be pre-

sented in section 3. In section we show a guided GA

to improve convergence rates in ICA algorithms. Fi-

nally we diplay results and state some conclusions in

section 5.

351

M. Górriz J., G. Puntonet C. and Martín-Clemente R. (2004).

IMPROVING ICA ALGORITHMS APPLIED TO PREDICTING STOCK RETURNS.

In Proceedings of the First International Conference on E-Business and Telecommunication Networks, pages 351-355

DOI: 10.5220/0001399803510355

 SciTePress

Figure 1: Independent Component Analysis and BSS prob-

lem. We suppose in ﬁnancial ICA that the observed signal

are linearly generated by some underlying independent fac-

tors.

2 INTRODUCTION TO ICA

ICA has been used as a solution of the blind source

separation problem (Bell and Sejnowski, 1995) denot-

ing the process of taking a set of measured signal in a

vector, x, and extracting from them a new set of sta-

tistically independent components (ICs) in a vector y.

In the basic ICA each component of the vector x is

a linear instantaneous mixture of independent source

signals in a vector s with some unknown deterministic

mixing coefﬁcients:



i=1

(1)

Due to the nature of the mixing model we are able

to estimate the original sources ˜s

and the de-mixing

weights b

applying i.e. ICA algorithms based on

higher order statistics like cumulants.

˜s



i=1

(2)

Using vector-matrix notation and deﬁning a time

series vector x =(x

,...,x

)

, s,

s and the matrix

A = {a

} and B = {b

} we can write the overall

process as:

s = Bx = BAs = Gs (3)

where we deﬁne G as the overall transfer matrix. The

estimated original sources will be, under some con-

ditions included in Darmois-Skitovich theorem (Cao

and Liu, 1996), a permuted and scaled version of the

original ones. Thus, in general, it is only possible to

Figure 2: Schematic representation of the ICA preprocess-

ing step.

ﬁnd G such that G = PD where P is a permutation

matrix and D is a diagonal scaling matrix.

3 SVM-ICA ALGORITHM

SVM are essentially Regularization Networks (RN)

with the kernels being Green´s function of the cor-

responding regularization operators (Smola et al., ).

Using this connection, with a clever choice of regu-

larization operator (based on SVM philosophy), we

should obtain a parametric model being very resistant

to the overﬁtting problem. Our parametric model is

a Resource allocating Network (Platt, 1991) charac-

terized by the control of neural resources and by the

use of matrix decompositions, i.e. Singular Value De-

composition (SVD) and QR Decomposition to input

selection and neural pruning (G

orriz et al., 2004).

The on-line RN based on “Resource Allocating

Network” algorithms (RAN)

(Platt, 1991) which

consist of a network using RBFs, a strategy for

• Allocating new units (RBFs), using two part nov-

elty condition (Platt, 1991)

• Input space selection and neural pruning using ma-

trix decompositions such as SVD and QR with piv-

oting (Salmer

on-Campos, 2001).

and a learning rule based on the Structural Risk Min-

imization principle as discuss in (G

orriz et al., 2004).

4 GA APPLIED TO THE

SEPARATION MATRIX IN ICA

PROCEDURE.

A GA can be modelled by means of a time inhomo-

geneous Markov chain obtaining interesting proper-

ties related with weak and strong ergodicity, conver-

gence and the distribution probability of the process.

A canonical GA is constituted by operations of pa-

rameter encoding, population initialization, crossover

The principal feature of these algorithms is sequential

adaptation of neural resources.

ICETE 2004 - WIRELESS COMMUNICATION SYSTEMS AND NETWORKS

352

, mutation, mate selection, population replacement,

ﬁtness scaling, etc. proving that with these simple op-

erators a GA does not converge to a population con-

taining only optimal members. However, there are

GAs that converge to the optimum, The Elitist GA and

those which introduce Reduction Operators (Eiben

et al., 1991). Our GA combines the two balancing

goals: exploiting the blindly search like a canoni-

cal GA and using statistical properties like a standard

ICA algorithm. In order to include statistical infor-

mation into the algorithm (it would be a nonsense to

ignore it!) we deﬁne the hybrid statistical genetic op-

erator based on reduction operators as follows:

q, M

p =

ℵ(T

)

exp



||q − S

· p||



; p, q ∈ ℘

(4)

where ℵ(T

) is the normalization constant depending

on temperature T

, n is the iteration and S

is the step

matrix which contains statistical properties, i.e based

on cumulants it can be expressed using quasi-Newton

algorithms as (Hyv

arinen and Oja, ):

=(I − µ

1,β

y,y

− I)); p

∈ C (5)

where C

1,β

y,y

is the cross-cumulant matrix whose ele-

ments are [C

α,β

y,y

]

Cum(y

,...,y

  

,...,y

  

) and S

is the sign matrix

of the output cumulants.

Finally the guided GA (GGA) is modelled, at each

step, as the stochastic matrix product acting on prob-

ability distributions over populations:

= P

· F

· C

· M

,G)

(6)

where F

is the selection operator, P

a reduc-

tion operator, C

is the cross-over operator and

,G)

are the mutation and guided operators.

The GA used applies local search (using the se-

lected mutation and crossover operators) around the

values (or individuals) found to be optimal (elite) the

last time. The computational time depends on the

encoding length, number of individuals and genes.

Because of the probabilistic nature of the GA-based

method, the proposed method almost converges to a

global optimal solution on average. In our simulation

nonconvergent case was found.

5 SIMULATIONS AND

CONCLUSIONS

To check the performance of the proposed hybrid al-

gorithm, 50 computer simulations were conducted to

test the GGA vs. the GA method (without guide)

and the most relevant ICA algorithm to date, Fas-

tICA (Hyv

arinen and Oja, ). In this paper we neglect

the evaluation of the computational complexity of the

current methods, described in detail in several refer-

ences such as (Tan and Wang, 2001). The main rea-

son lies in the fact that we are using a 8 nodes Cluster

Pentium II 332MHz 512Kb Cache, thus the compu-

tational requirements of the algorithms (ﬁtness func-

tions, encoding, etc.) are generally negligible com-

pared with the cluster capacity. Logically GA-based

BSS approaches suffer from a higher computational

complexity.

Consider the mixing cases from 2 to 20 indepen-

dent random super-gaussian input signals. We focuss

our attention on the evolution of the crosstalk vs. the

number of iterations using a mixing matrix randomly

chosen in the interval [−1, +1]. The number of indi-

viduals chosen in the GA methods were N

=30in

the 50 (randomly mixing matrices) simulations for a

number of input sources from 2 (standard BSS prob-

lem) to 20 (BSS in biomedicine or ﬁnances). The

standard deviation of the parameters of the separation

over the 50 runs never exceeded 1% of their mean

values while using the FASTICA method we found

large deviations from different mixing matrices due

to its limited capacity of local search as dimension in-

creases. The results for the crosstalk are displayed in

Table 5. It can be seen from the simulation results

that the FASTICA convergence rate decreases as di-

mension increases whereas GA approaches work efﬁ-

ciently.

A GGA-based BSS method has been developed to

solve BSS problem from the linear mixtures of inde-

pendent sources. The proposed method obtain a good

performance overcoming the local minima problem

over multidimensional domains (see table 5). Exten-

sive simulation results prove the ability of the pro-

posed method. This is particular useful in some

medical applications where input space dimension in-

creases and in real time applications where reaching

fast convergence rates is the major objective.

Minimizing the regularizated risk functional, using

an operator the enforce ﬂatness in feature space, we

build a hybrid model that achieves high prediction

performance (G

orriz et al., 2003), comparing with the

previous on-line algorithms for time series forecast-

ing. This performance is similar to the one achieve

by SVM but with lower computational time demand,

essential feature in real-time systems. The beneﬁts

of SVM for regression choice consist in solving a

-uniquely solvable- quadratic optimization problem,

unlike the general RBF networks, which requires suit-

able non-linear optimization with danger of getting

stuck in local minima. Nevertheless the RBF net-

works used in this paper, with the help of various

techniques obtain high performance, even under ex-

tremely volatile conditions, since the level of noise

and the change of delay operation mode applied to

the chaotic dynamics was rather high.

IMPROVING ICA ALGORITHMS APPLIED TO PREDICTING STOCK RETURNS

353

Table 1: Figures: 1) Mean Crosstalk (50 runs) vs. iterations

to reach the convergence for num. sources equal to 2 using

Ga approach and FASTICA (Hyv

arinen and Oja, ). 2) Mean

Crosstalk (50 runs) vs. iterations to reach the convergence

for num. sources equal to 20 3) Set of Series used. 4) Set

of ICs.

ICETE 2004 - WIRELESS COMMUNICATION SYSTEMS AND NETWORKS

354

REFERENCES

Back, A. and Weigend, A. (1997). A ﬁrst application of in-

dependent component analysis to extracting structure

from stock returns. International Journal of Neural

Systems, vol. 8(5).

Bell, A. and Sejnowski, T. (1995). An information-

maximization approach to blind separation and blind

deconvolution. Neural Computation, vol. 7:1129–

1159.

Cao, X. and Liu, W. (1996). General approach to blind

source separation. In IEEE Transactions on signal

Processing, volume vol. 44,3, pages 562–571.

Eiben, A., Aarts, E., and Hee, K. V. (1991). Global conver-

gence of genetic algorithms:a markov chain analysis.

Parallel Problem Solving from Nature, Lecture Notes

in Computer Science, vol. 496:4–12.

orriz, J. M., Puntonet, C. G., Salmer

on, M., and Ortega,

J. (June 2003). New method for ﬁltered ica signals

applied to volatile time series. Lecture Notes in Com-

puter Science, LNCS, vol. 2687:433–440,ISSN 0302–

9743.

orriz, J., Puntonet, C., and Salmer

on, M. (2004). On line

algorithm for time series prediction based on support

vector machine philosophy. In The International Con-

ference on Computational Science. Lecture notes on

Computer Science. ICCS 2004, Krakow, Poland USA.

Hyv

arinen, A. and Oja, E. A fast ﬁxed point algorithm for

independent component analysis. Neural Computa-

tion, 9:1483–1492.

Muller, K., Smola, A., Ratsch, G., Scholkopf, B., and

Kohlmorgen, J. (1999). Using support vector ma-

chines for time series prediction. Advances in kernel

Methods-Support Vector Learning, MIT Press, pages

243–254.

Platt, J. (1991). A resource-allocating network for function

interpolation. Neural Computation, 3:213–225.

Salmer

on-Campos, M. (2001). Predicci

on de Series Tem-

porales con Rede Neuronales de Funciones Radiales

ecnicas de Descomposici

on Matricial. Phd thesis,

University of Granada, Departamento de Arquitectura

y Tecnolog

ıa de Computadores.

Smola, A., Scholkopf, B., and Muller, K. The connection

between regularization operators and support vector

kernels. Neural Networks, 11:637–649.

Tan, Y. and Wang, J. (2001). Nonlinear blind source sepa-

ration using higher order statistics and a genetic algo-

rithm. IEEE Transactions on Evolutionary Computa-

tion, vol. 5(6).

Vapnik, V. (1999). The Nature of Statistical Learning The-

ory. Springer, 2nd edition edition.

IMPROVING ICA ALGORITHMS APPLIED TO PREDICTING STOCK RETURNS

355