Neural Approaches to Image Compression/

Decompression Using PCA based Learning Algorithms

Luminita State

, Catalina Cocianu

, Panayiotis Vlamos

and Doru Constantin

Department of Computer Science, University of Pitesti, Pitesti, Romania

Department of Computer Science, Academy of Economic Studies, Bucharest, Romania

Department of Computer Science, Ionian University, Corfu, Greece

Abstract. Principal Component Analysis is a well-known statistical method for

feature extraction, data compression and multivariate data projection. Aiming to

obtain a guideline for choosing a proper method for a specific application we

developed a series of simulations on some the most currently used PCA

algorithms as GHA, Sanger variant of GHA and APEX. The paper reports the

conclusions experimentally derived on the convergence rates and their

corresponding efficiency for specific image processing tasks.

1 Introduction

Principal component analysis allows the identification of a linear transform such that

the axes of the resulted coordinate system correspond to the largest variability of the

signal. The signal features corresponding to the new coordinate system are

uncorrelated. One of the most frequently used method in the study of convergence

properties corresponding to different stochastic learning PCA algorithms basically

proceeds by reducing the problem to the analysis of asymptotic stability of the

trajectories of a dynamic system whose evolution is described in terms of an ODE [5].

The Generalized Hebbian Algorithm (GHA) extends the Oja’s learning rule for

learning the first principal components. Aiming to obtain a guideline for choosing a

proper method for a specific application we developed a series of simulations on some

the most currently used PCA algorithms as GHA, Sanger variant of GHA and APEX.

2 Hebbian Learning in Feed-forward Architectures

The input signal is modeled as a wide-sense-stationary n-dimensional process

()()

0≥t,tX of mean 0 and covariance matrix

(

)

(

)

(

)

StXtXE

= . We denote by

,...,ΦΦ

a set of unit eigen-vectors of S indexed according to the decreasing order

of their corresponding eigen-values

≥≥

≥

. The most informative

State L., Cocianu C., Vlamos P. and Constantin D. (2008).

Neural Approaches to Image Compression/Decompression Using PCA based Learning Algorithms.

In Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pages 187-192

 SciTePress

directions of the process

(

)()

0≥t,tX are given by

,...,

and for any

nmm ≤≤1, its LMS-optimal linear features are

,...,

. The architecture of a

PCA neural network consists of the n-neuron input layer and the m-neuron

computation layer. The aim is to develop an adaptive learning algorithm to encode

asymptotically

,...,ΦΦ

as values of the synaptic vectors

W,...,W

of the neurons

in the computation layer. Let

(

)

(

)

(

)

(

)

tW,...,tWtW

be the synaptic memory at the

moment t, and let

()

(

)

(

)

(

)

tY,...,tYtY

= be the output of the computation

layer, mj ≤≤1 ,

() () ()

tXt

Y = . The Hebbian rule for learning the first principal

component is,

() ()

(

)

(

)

(

)

kYkXkkWkW

111

=+ , where the sequence of learning rates

()()

kη are taken such that the conditions of the Kushner theorem hold [5],

()

∞=η

∑

∞

=1k

k ,

()

∞→

klim

, there exists 1p > such that

()()

∞<η

∑

∞

=1k

k . The

normalized version of the Hebbian learning rule is,

()

(

)

(

)

(

)

(

)

() () () ()

kYkXkkW

(1)

In order to get a local learning scheme a linearized version of (1) using first order

approximation was proposed in [7] yielding to the cellebrated Oja’s learning

algorithm,

() ()

(

)

(

)

(

)

(

)

(

)

(

)

kWkYkYkXkkWkW

1111

1 −+=+

(2)

The Generalized Hebbian Algorithm (GHA)

[3] is one of the first neural models

for extracting multiple PCs. At any moment t, each neuron j,

1≥j , receives two

inputs, the original signal X(t) and the deflated signal X

(t) and computes

() () ()

tXtWtY

= and

()

(

)

(

)

tXtWtY

= ,

(

)

(

)

(

)()

tWtYtXtX

jjjj 111

−−−

−= , 2≥j .

The GHA learning scheme is, for

≤

2 ,

() ()

(

)

(

)

(

)

(

)

(

)

(

)

kWkYkYkXkkWkW

1111

1 −+=+

(3)

() ()

(

)

(

)

(

)

(

)

(

)

(

)

kWkYkYkXkkWkW

jjjjjj

1 −+=+

(4)

where

() () ()

,kXkWkY

(

)

(

)

(

)

(

)

=−=

−−−

kWkYkX

jjjj 111

() ()

∑

−

kWkY

, and

() () ()

kWkY

= .

The variant proposed by Sanger [7] simplifies the learning process by using only

output of each neuron in both, the synaptic learning scheme and the input deflation.

The Sanger variant of GHA is, for

≤

() ()

(

)

(

)

(

)

(

)

(

)

(

)

kWkYkYkXkkWkW

1111

1 −η+=+

(5)

() ()

(

)

(

)

(

)

(

)

(

)

(

)

kWkYkYkXkkWkW

jjjjjj

1 −+=+

(6)

188

where

() () ()

kXkWkY

= and

(

)

(

)

(

)

(

)

kWkYkX

jjjj 111 −−−

−= is the input deflated

at the level of the jth neuron.

The APEX learning algorithm proposed in [2] generalizes the idea of lateral

influences by imposing a certain learning process to the weights of lateral connections.

The output of each neuron j, is computed from its own output and the effects of the

outputs corresponding to all neurons i,

−

≤

ji , weighted by the coefficients

()

() () () () ()

∑

−

−=

iij

tYtatXtWtY

2≥j

(7)

The learning scheme for the local memories is essentially the Oja’s learning rule

taken for the transformed outputs

Y ,

()

(

)

(

)

(

)

(

)

(

)

(

)

[

]

tWtYtXtYttWtW

jjjjj

1 −η+=+

(8)

The learning scheme for the weights of lateral connections is given by,

()

(

)

(

)

(

)

(

)

(

)

(

)

[

]

tatYtYtYttata

ijjjiijij

1 −η+=+

(9)

Note that the theoretical analysis [1], [2], [3], establishes the almost sure

convergence to the principal components of the sequences of weight vectors

generated by the above mentioned algorithms.

3 Recursive Least Square Learning Algorithm of the Principal

Directions

Let W

(t-1) be the synaptic vector at the moment t and assume that the inputs are

applied at the moments

t=0,1,2,…. If we denote by X(k) the input at the moment k,

then the output is

Y(k) = W

(k-1)h

(k) = W

(k-1)W

(k-1) X(k), where h

(k)=W

(k-1)

(k) is the neural activation induced by the input. The mean error at the moment t is

() ( )

∑

ε=

ktJ

, where

()

(

)

(

)

kYkXk −=ε

. The aim is to determine

()

minimizing

()()

tWJ

the overall error, when at each moment of time k, 1

≤

t, the

decompression is assumed as being performed using the filter

(t), that is,

()

(

)

=tWJ

() () ()()()()()()

khtWkXkhtWkX

−−

∑

(10)

()

()()

⎟

⎠

⎞

⎜

⎝

⎛

∈

tWJtW

RtW

111

infarg

()

(

)

(

)

() ()

thth

thtX

(11)

Denoting by

() ( )

−

⎟

⎠

⎞

⎜

⎝

⎛

∑

khtP

and

(

)

(

)

(

)

tPthtk

111

, we get the RLS algorithm ,

(

)

W randomly selected; h

(t) = W

(t-1)X(t);

()

(

)()

() ( )

−+

−

tPth

thtP

189

(t)=[1-k

(t)h

(t)]P

(t-1);

(

)

(

)

(

)

(

)

(

)( )

[

]

−−+−= tWthtXtktWtW

In case that the largest eigen value

of the covariance matrix

is of multiplicity

order 1 and let

be its corresponding unit eigen vector. The theoretical analysis

concerning the behavior of the sequence

(

)

(

)

∈

establishes that, almost surely,

()()

>φ

W , then

(

)

lim

φ=

∞→

. If

(

)

(

)

<φ

W , then

(

)

lim

φ−=

∞→

Let

∑

φα=

ttX

)()( be the expansion of the input signal in terms of the

{}

φφ ,...,

, an orthogonal basis of

eigen vectors, where the corresponding eigen

values are taken in the decreasing order. Let

∑

−

φα−=

)()()(

iip

ttXtd be the deflated

signal at the level p,

np ≤≤2 . The extended RLS algorithm for learning the first m

principal components is given by the following learning equations.

(t)=W

(t-1)X(t);

()

(

)

(

)

() ( )

−+

−

tPth

thtP

(t)=[1-k

(t)h

(t)]P

(t-

1) ;

() ( ) ()

(

)

(

)

(

)

[

]

ˆˆ

−−+−= tWthtXtktWtW

ppppp

Theoretical analysis establishes that, if

npp

≥≥>>>>

121

, then

for each

1≥p , the sequence

(

)

(

)

∈

generated by the extended RLS converges to

either

−

4 Experimental Analysis and Concluding Remarks

In the following we present the use of above mentioned learning schemes for image

compression/decompression purposes. Let

(

)

be a wide-sense- stationary N-

dimensional process of mean

(

)

(

)

tIE

resulted by sampling a given image

;

() () ()

(

)

tIEtItI −=

. Each sampled matrix

(

)

tI is processed row by row, each row

being split in lists of 15 consecutive components. We denote by

(

)

tX such a sub-list

and we assume that

()

∑

,N 0

. We denote by 15

n the dimension of the input

data,

3=m the number of desired principal components,

{

}

75502010 ,,,t

max

∈

the

number of the variants of the image

()

the sequence of learning rates

taken to satisfy the constrains considered in the Kushner theorem,

()

RMW

x3150

∈

the

initial synaptic memories whose entries are randomly generated; each column vector

W is of norm 1. In case of the APEX algorithm, the initial values of the lateral

connection weights are

≥∀

a,ji

and for all

≤

a are randomly

generated according to the uniform distribution on[0,1) . The reported results are

190

obtained with respect to the following examples,

(

)

(

)

(

)

3,2,1,,...,

151

==∑ idiag

σσ

()

,1.0

()

(

)

(

)

(

)

,10

,15

====

σσσσ

(

)()

04.0,2

σσ

() () ()

,1,4

===

σσσ

()

01.0

, 154

≤

k . The empirical mean variation of

the synaptic vectors on the final iteration and the mean error with respect to the eigen

vectors are given by,

()( )()

∑

−=

maximaxi

tW,tWD

()( )( ) ( )() ( )()

∑

−−=−

iiii

ktWktWtWtWD

maxmaxmaxmax

11,

()()

∑

Φ=

imaxi

,tWE

()() ( )() ()

∑

Φ−=Φ

imaxiimaxi

kktWD

,tWE

The obtained results are shown in Table 1and Table 2.

Table 1.

V -

GHA

Er-

GHA

V -

Sanger

Er-

Sanger

V-

APEX

Er-

APEX

∑

max

0.0377 0.0339 0.0371 0.0423 0.0491 0.0499

∑

0.0186 0.0295 0.0186 0.0403 0.0243 0.0426

∑

0.0064 0.0379 0.0054 0.0532 0.0074 0.0417

∑

0.0387 0.0334 0.0393 0.0429 0.0542 0.0485

∑

0.0276 0.0331 0.0273 0.0450 0.0348 0.0453

∑

0.0090 0.0414 0.0070 0.0561 0.0102 0.0442

∑

0.0896 0.0417 0.0851 0.0551 0.1085 0.0543

∑

0.0572 0.0434 0.0499 0.0572 0.0662 0.0505

∑

0.0160 0.0484 0.0114 0.0614 0.0172 0.0493

∑

0.1569 0.0555 0.1751 0.0612 0.1759 0.0626

∑

0.0774 0.0629 0.0600 0.0629 0.0858 0.0531

∑

0.0202 0.0637 0.0135 0.0637 0.0211 0.0520

∑

The entries of

()

RMW

x3150

∈ are randomly generated, but each column of

W is

of norm 1. The ratio

of the stabilization coefficient V and the error Er, is fast

decreasing in case of the APEX and GHA algorithms as compared to its variation in

case of Sanger variant. The APEX and GHA lead to smaller errors versus the

stabilization index V. The stabilization of the Sanger variant is installed faster than in

case of GHA and APEX. The errors are significantly influenced by the variation of

the eigen values and they are less influenced by their actual magnitude.

191

According to the results obtained by our tests we conclude that there are no

significant differences from the point of view of the corresponding convergence rates

between the GHA and the Sanger variant, but the APEX algorithm proves to be

slower than them, most probably because it the convergence rate is more influenced

by the initial values. Also, the performance is strongly dependent on the magnitude

of the noise variances. The tests on the efficiency of the RLS algorithm were

performed on the 10

×10 matrix representations of the Latin letters. The experiments

pointed out that the good quality can be maintained when the

compression/decompression process involved at least the first 15 components. Only 5

line features assure enough accuracy in the compression/decompression process.

Table 2.

- GH

-Sange

-APE

∑

max

1.1120 0.8770 0.9839

∑

1.1586 0.9160 1.2634

∑

2.1486

1.5446 1.9981

∑

2.8270 2.1320 2.8099

∑

0.6305 0.4615 0.6029

∑

0.8338

0.6066 0.7682

∑

1.3179 0.8723 1.3108

∑

1.5357 0.9538 1.6158

∑

0.1688

0.1015 0.1774

∑

0.2173 0.1247 0.2307

∑

0.3305 0.1856 0.3488

∑

0.3899

0.2119 0.4057

∑

References

1. Chatterjee, C., Roychowdhury, V.P., Chong, E.K.P.: On Relative Convergence Properties

of PCA Algorithms, IEEE Trans. on Neural Networks, vol.9,no.2 (1998)

2. Diamantaras, K.I., Kung, S.Y.: Principal Component Neural Networks: theory and

applications, John Wiley &Sons, (1996)

3. Haykin, S., Neural Networks A Comprehensive Foundation, Prentice Hall,Inc. (1999)

4. Hastie, T., Tibshirani, R., Friedman,J.: The Elements of Statistical Learning Data Mining,

Inference, and Prediction, Springer (2001)

5. Kushner, H.J., Clark, D.S.: Stochastic Approximation Methods for Constrained and

Unconstrained Systems, Springer Verlag (1978)

6. J. Karhunen, E. Oja: New Methods for Stochastic Approximations of Truncated Karhunen-

Loeve Expansions, Proc. 6

Intl. Conf. on Pattern Recognition, Springer Verlag (1982)

7. Sanger, T.D.: An Optimality Principle for Unsupervised Learning, Advances in Neural

Information Systems, ed. D.S. Touretzky, Morgan Kaufmann (1989)

192