ONLINE SEQUENTIAL LEARNING BASED ON ENHANCED

EXTREME LEARNING MACHINE USING LEFT OR RIGHT

PSEUDO-INVERSE

Weiwei Zong, Yuan Lan and Guang-Bin Huang

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

Keywords:

Feedforward network, Extreme learning machine, Online sequential learning.

Abstract:

The latest development (Huang et al., 2011) has shown that better generalization performance can be obtained

for extreme learning machine (ELM) by adding a positive value to the diagonal of H

H or HH

, where H is

the hidden layer output matrix. This paper further extends this enhanced ELM to online sequential learning

mode. An online sequential learning algorithm is proposed for SLFNs and other regularization networks,

consisting of two formulas for two kinds of scenarios: when initial training data is of small scale or large scale.

Performance of proposed online sequential learning algorithm is demonstrated through six benchmarking data

sets for both regression and multi-class classiﬁcation problems.

1 INTRODUCTION

Training algorithms for feedforward networks, in-

cluding least-square based extreme learning machine

(ELM) (Huang et al., 2006b; Huang et al., 2006a;

Huang and Chen, 2007; Huang and Chen, 2008) for

single-hidden layer feedforward networks (SLFNs)

and gradient-descent based backpropagation (BP)

method (Rumelhart et al., 1986) for multi-layer feed-

forward neural networks, have attracted the attention

of many researchers for the past years. Main focus

is given to the batch learning mode of the aforemen-

tioned algorithms. However, in real world applica-

tions, the training data may not come at once or the

size of training data may be too large. In such circum-

stances, online sequential learning instead of batch

learning is preferred.

Sequential learning algorithms based on BP for

SLFNs with additive nodes have been proposed in lit-

erature (Ngia et al., 1998; Asirvadam et al., 2002).

Resource allocation network (RAN), one of the train-

ing algorithms for feedforward networks with RBF

nodes, has been extended to sequential learning mode

as well (Huang et al., 2004; Huang et al., 2005).

These sequential learning algorithms may not be ef-

ﬁcient enough due to the disadvantages in conver-

gence rate, training speed and parameter tuning com-

plexity. Moreover, the data can be learned only on

a one by one basis.Online sequential extreme learn-

ing machine (OS-ELM) was proposed by Liang, et

al (Liang et al., 2006) where the training data can be

learned not only on a one-by-one basis but also on a

chunk-by-chunk basis. OS-ELM is based on the orig-

inal ELM where the SLFN can be viewed as a linear

system with the solution being the left pseudo-inverse

of the hidden layer output matrix H in the following

form: H

†

= (H

−1

. Inheriting the advantage of

simplicity from original ELM, which randomly gen-

erates the hidden layer nodes, OS-ELM outperforms

the state-of-art sequential learning algorithms both in

generalization capability and in computational efﬁ-

ciency. Therefore, the performance comparison with

the state-of-art method in this paper is conducted over

OS-ELM.

As mentioned in ridge regression theory (Hoerl

and Kennard, 1970), the solution tends to be more

stable and better generalization performance can be

achieved by adding a positive value to the diagonal of

H or HH

. The resultant enhanced ELM with sig-

moid additive nodes has been studied in the work of

Toh (Toh, 2008) and Deng, et al (Deng et al., 2009).

In the recent work of Huang, et al (Huang et al.,

2011), the idea of a uniﬁed solution based on ELM

(referred to as eELM) for SLFNs and other regular-

ization networks with wide type of feature mappings

or kernels was ﬁrst proposed and fulﬁlled.

OS-ELM is derived on the basis of batch mode

original ELM using left pseudo-inverse. However,

latest development in ELM (Huang et al., 2011) has

shown much better advantage in generalization ca-

300

Zong W., Lan Y. and Huang G. (2012).

ONLINE SEQUENTIAL LEARNING BASED ON ENHANCED EXTREME LEARNING MACHINE USING LEFT OR RIGHT PSEUDO-INVERSE.

In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, pages 300-305

DOI: 10.5220/0003777603000305

 SciTePress

pability than the original ELM. On the other hand,

to the best of our knowledge, very few work has

been done about the sequential learning using right

pseudo-inverse. In this paper, we propose an on-

line sequential learning algorithm based on eELM

using the right pseudo-inverse (referred to as OS-

eELM-right). Moreover, the online sequential learn-

ing for SLFNs and other regularization networks is

built regarding both left pseudo-inverse(referred to as

OS-eELM-left) and right pseudo-inverse (OS-eELM-

right). The performanceof the proposed frameworkis

compared with original OS-ELM through six bench-

marking data sets for regression and multi-class clas-

siﬁcation applications.

The rest of this paper is organized as follows. Sec-

tion 2 gives a brief introduction of ELM and its en-

hanced version eELM. Section 3 derives the proposed

framework including OS-eELM-left and OS-eELM-

right. Performance evaluation over benchmarking

data sets is provided in Section 4. A summary is pre-

sented in Section 5.

2 BRIEF REVIEW OF ELM

2.1 Review of ELM

ELM (Huang et al., 2006b) was originally proposed

for the single-hidden layer feedforward neural net-

works and was then extended to the SLFNs where

the hidden layer need not be neuron alike (Huang and

Chen, 2007; Huang and Chen, 2008). The main fea-

ture of ELM lies in that the hidden layer need not

be tuned. Instead of iterative tuning as in traditional

learning algorithms, in ELM, the hidden nodes are

randomly generated which is independentof the train-

ing data.

After randomly generating L hidden nodes, with

the (row) vector h(x) = [h

(x), ··· , h

(x)] presenting

the outputs of the L hidden nodes with respect to the

input x, the SLFNs is essentially a linear system

Hβ = T, (1)

where β = [β

, ··· , β

] is the vector of the output

weights, and H is the hidden layer output matrix

H =







h(x

)

h(x

)







. (2)

In the original implementation of ELM, the minimal

norm least square solution is

β = H

†

T, (3)

where T is the label matrix, and H

†

is the Moore-

Penrose generalized inverse of matrix H (Rao and

Mitra, 1971; Serre, 2002). One of the methods

to calculate Moore-Penrose generalized inverse of

a matrix is the orthogonal projection method (Rao

and Mitra, 1971): H

†





−1

(called left

pseudo-inverse) when H

H is nonsingular or H

†





−1

(called right pseudo-inverse) when

is nonsingular. Usually the left pseudo-inverse

is suitable when the size of training data is large; oth-

erwise the right pseudo-inverse is better in terms of

training speed.

2.2 Review of eELM

According to the ridge regression theory (Hoerl and

Kennard, 1970), one can add a positive value to the

diagonal of HH

or H

H, the resultant solution is

more stable and tends to have better generalization

performance. In (Huang et al., 2011) a uniﬁed so-

lution framework for SLFNs, SVM and other regular-

ization networks were proposed where solution based

on right pseudo-inverse for small scale of dataset and

solution based on left pseudo-inverse for large scale

dataset are given by

Right: β = H

†

T = H



+ HH



−1

Left: β = H

†

T =



+ H



−1

(4)

where C is determined by users. Observed from sim-

ulations on wide types of datasets, the hidden node

number L is normally set as a large value to obtain

good generalization performance, while the regular-

ization term C in (4) is the only parameter user needs

to specify according to various datasets.

3 THE PROPOSED ONLINE

SEQUENTIAL LEARNING

ALGORITHM

In this section, the online sequential learning al-

gorithm based on eELM is proposed, comprising

of OS-eELM-left which calculates the left pseudo-

inverse when small scale of initial training data is

observed, and OS-eELM-right which calculates the

right pseudo-inverse when large scale of initial train-

ing data is presented.

3.1 OS-eELM-left

The only difference between OS-eELM-left and OS-

ONLINE SEQUENTIAL LEARNING BASED ON ENHANCED EXTREME LEARNING MACHINE USING LEFT OR

RIGHT PSEUDO-INVERSE

301

ELM is that OS-eELM is derived on the basis of

the latest development of ELM (Huang et al., 2011),

which we call eELM in this paper. Hence, it is not

difﬁcult to ﬁnd out that only a slight change need to

be made to extend OS-ELM to OS-eELM-left. The

solution is to obtain the initial output weight β

(0)



+ H



−1

= P

. The rest of

the equations in the training procedure remains ex-

actly the same as OS-ELM. Therefore, more focus

is given to the derivation of OS-eELM-right which

makes a great difference to OS-ELM by using right

pesudo-inverse.

3.2 The Proposed OS-eELM-right

Given the initial training set ℵ

= {(x

, t

)}

i=1

, the

minimum norm least square solution to minimize

k H

β − T

k is given by β

(0)

†

= H

(

)

−1

. Given another chunk of data ℵ

{(x

, t

)}

i=N

of size N

, from the batch learning

point of view, the problem is to minimize





(1)

−





. (5)

According to (4), the output weight β becomes

(1)











−1





(6)

Let A =







, we have

−1



+ H



−1





This 2 × 2 block matrix is invertible if and only if

(

+ H

) and its schur complement are invertible

(Boyd et al., 1994). From the initial step, it is easy to

show that (

+ H

) is invertible. Similar to (Feng

et al., 2009), it can be proved that schur complement

of (

+ H

), denoted by S, is invertible with prob-

ability one. And according to (Boyd et al., 1994), we

have



+ H



−1



+ H



−1





−1







+ H



−1

= −



+ H



−1





−1

= −S

−1







+ H



−1

= S

−1

(7)

where

S =



+ H



−







+ H



−1





+ H



I− H

†



(8)

Denote P

= I− H

†

, then S

, which represents

the state of S after the ﬁrst chunk data arrives, can

be expressed as: S

+ H



I− H

†



. Thus we have

(1)











+ H

=β

(0)

+ H

†

−1

(0)

− H

−1

(0)

− H

†

−1

+ H

−1

=β

(0)

−



I− H

†



−1



(0)

− T



=β

(0)

− P

−1



(0)

− T



(9)

In completion of the learning of the ﬁrst chunk of

data, P is updated as follows

=I−





†





=I− H

†

− H

†

−1

†

+ H

−1

†

+ H

†

−1

− H

−1

=(I− H

†

) + (I− H

†

−1

†

− (I− H

†

−1

− P

−1

(10)

As seen from equation (9), the output weight β

(1)

is expressed as a function of β

(0)

, S

, P

and newly ar-

riving chunk of data H

; S

can actually be expressed

as the function of P

and newly arriving chunk H

; P

is updated using itself and newly arriving data only in

each iteration. Therefore, whenever new data arrives,

a recursive algorithm for updating the output weight

β can be derived. Given (k + 1)th chunk of data, the

hidden layer output matrix of which is H

k+1

, the re-

cursive algorithm works as follows

k+1

+ H

k+1

(k+1)

= β

(k)

− P

k+1

−1

k+1



k+1

(k)

− T

k+1



(11)

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

302

Finally, the proposed OS-eELM-right can be summa-

rized as follows.

The proposed algorithm can learn data chunk by

chunk or one by one, thus is able to handle the situ-

ation where data is arriving sequentially. In general,

there are two stages during training, namely the ini-

tialization stage and incremental learning stage.

(1) Initialization. Given the chunk of initial training

data ℵ

= {(x

, t

)}

i=1

and x

∈ R

, t

∈ R

a) Randomly generate L hidden nodes (additive

hidden nodes or RBF hidden nodes). Good per-

formance can be achieved normally when L is

assigned a large value.

b) Calculate the initial hidden layer output matrix

c) Compute the initial output weight β

(0)

(

+ H

)

−1

d) Calculate P

= I −

†

= I − H

(

)

−1

e) Set k = 0.

(2) Incremental Learning. Given (k + 1)th chunk

of data

ℵ

k+1

= {(x

, t

)}

∑

k+1

j=0

i=(

∑

j=0

)+1

, where N

k+1

presents the number of training samples in the

(k+ 1)th chunk,

a) Calculate the hidden layer output matrix H

k+1

for the (k+ 1)th chunk of data.

b) Update the output weight:

k+1

+ H

k+1

(k+1)

= β

(k)

− P

k+1

−1

k+1



k+1

(k)

− T

k+1



(12)

c) Update P

k+1

= P

− P

k+1

−1

k+1

d) Set k = k + 1. Go to (2).

4 DISCUSSIONS

The choice of OS-eELM-left or OS-eELM-right is de-

pendent on the size of initial training data N

. It is

wise to choose OS-eELM-right when a small scale of

initial training data is provided, otherwise OS-eELM-

left is a better choice. Since good performance is

achieved normally when the number of hidden nodes

is set a large value, which is 1,000 in (Huang et al.,

2011) corresponding to up to 40,000 observations, the

threshold for the choice of left or right is set the same

as 1,000. OS-eELM-left is recommended when more

than 1,000 observations are given during the initial

training phase; otherwise OS-eELM-right is supposed

to save the computation and time.

Both of OS-eELM-left and OS-eELM-right can

learn data on a one-by-one basis or chunk-by-chunk

basis where the chunk size can be varied. When only

one observation (x

k+1

, t

k+1

) is provided during the in-

cremental learning phase, the updating equation for

(k+1)

and P

k+1

can be further simpliﬁed

(k+1)

= β

(k)

−

k+1

(k)

− t

k+1

)

+ h

k+1

= P

−

k+1

+ h

k+1

(13)

Similar to OS-ELM, batch mode eELM-left can

be considered as the special case of OS-eELM-left

while batch mode eELM-right is the special case of

OS-eELM-right, when N

= N.

5 PERFORMANCE EVALUATION

Since OS-ELM has been veriﬁed to have better per-

formance than other well-known sequential learning

algorithms (Liang et al., 2006), in this paper, the

proposed OS-eELM-right is compared with OS-ELM

and OS-eELM-left. For sequential learnings, the

number of initial training samples is set the same as in

(Liang et al., 2006). All simulations are conducted on

a laptop with Pentium Dual CPU 1.86GHz and 2GB

memory.

Table 1: Speciﬁcation of Benchmark Data Sets.

Dataset #Attributes #Classes #Train Data #Test Data

Auto-MPG 7 - 320 72

Abalone 8 - 3,000 1,177

California Housing 8 - 8,000 12,640

Image Segment 19 7 1,500 810

Satellite Image 36 6 4,435 2,000

DNA 180 3 2,000 1,186

5.1 Data Sets Speciﬁcation

As shown in Table 1, six benchmarking data sets

(Blake and Merz, 1998) have been studied in the sim-

ulations, including three regression problems (auto-

MPG and abalone), and three multi-class classiﬁca-

tion problems (image segmentation and satellite im-

age). Same as in (Liang et al., 2006), the input at-

tributes and output attributes of regression problems

are normalized into the range of [0,1]; the input at-

tributes of classiﬁcation problems are normalized into

the range of [-1,1].

ONLINE SEQUENTIAL LEARNING BASED ON ENHANCED EXTREME LEARNING MACHINE USING LEFT OR

RIGHT PSEUDO-INVERSE

303

Table 2: Performance comparison on regression problems.

Regression Auto-MPG Abalone California Housing

ELM-Based Time RMSE Parameter Time RMSE Parameter Time RMSE Parameter

Algorithms (s) Training Testing C L (s) Training Testing C L (s) Training Testing C L

Sequential OS-ELM 0.0103 0.0686 0.0759 - 25 0.0437 0.0753 0.0779 - 25 0.3204 0.1304 0.1331 - 50

20-by-20 OS-eELM-left 3.9827 0.0607 0.0724 2

1000 23.9346 0.0719 0.0774 2

1000 75.1894 0.1246 0.1282 2

1000

OS-eELM-right 2.1338 0.0610 0.0709 2

1000 2.1338 0.0610 0.0709 2

1000 78.7415 0.1247 0.1282 2

1000

Sequential OS-ELM 0.0115 0.0680 0.0781 - 25 0.0711 0.0752 0.0783 - 25 0.4842 0.1303 0.1322 - 50

[10,30] OS-eELM-left 3.8794 0.0612 0.0706 2

1000 26.3018 0.0724 0.0771 2

1000 78.7824 0.1200 0.1294 2

1000

OS-eELM-right 2.6289 0.0578 0.0701 2

1000 15.8310 0.0752 0.0769 2

1000 71.4949 0.1203 0.1300 2

1000

Sequential OS-ELM 0.0365 0.0690 0.0739 - 25 0.2315 0.0755 0.0780 - 25 3.4161 0.1300 0.1338 - 50

1-by-1 OS-eELM-left 17.3220 0.0643 0.0714 2

1000 131.1391 0.0736 0.0766 2

1000 537.7711 0.1185 0.1285 2

1000

OS-eELM-right 14.1596 0.0644 0.0708 2

1000 106.4070 0.0724 0.0769 2

1000 497.1973 0.1203 0.1300 2

1000

Table 3: Performance comparison on multi-class classiﬁcation problems.

Classiﬁcation Image Segment Satellite Image DNA

ELM-Based Time Rates(%) Parameter Time Rates(%) Parameter Time Rates(%) Parameter

Algorithms (s) Training Testing C L (s) Training Testing C L (s) Training Testing C L

Sequential OS-ELM 0.7806 97.02 94.88 - 180 19.9366 91.96 88.95 - 400 1.9425 92.67 87.94 - 200

20-by-20 OS-eELM-left 17.7426 98.19 95.91 2

1000 55.5763 94.62 89.95 2

1000 20.5104 97.26 93.75 2

−8

1000

OS-eELM-right 11.9593 98.11 95.99 2

1000 52.0285 93.92 90.04 2

1000 20.1678 97.25 93.73 2

−8

1000

Sequential OS-ELM 1.1691 97.05 94.89 - 180 28.7451 91.97 88.98 - 400 1.8754 92.87 88.06 - 200

[10,30] OS-eELM-left 15.6828 98.54 95.94 2

1000 55.2733 94.64 89.91 2

1000 20.9615 97.30 93.66 2

−8

1000

OS-eELM-right 9.4103 98.56 96.00 2

1000 51.5571 95.03 89.67 2

1000 21.3806 97.32 93.77 2

−8

1000

Sequential OS-ELM 10.3722 97.05 94.75 - 180 377.1901 92.00 89.00 - 400 24.2925 92.57 87.91 - 200

1-by-1 OS-eELM-left 93.6714 97.58 95.88 2

1000 284.5552 93.93 89.99 2

1000 120.4346 97.28 93.74 2

−8

1000

OS-eELM-right 88.2663 98.17 95.92 2

1000 280.7665 93.94 89.98 2

1000 132.8770 97.24 93.65 2

−8

1000

5.2 Parameter Settings

Sigmoidal additive hidden nodes are selected for all

the algorithms. Other hidden nodes, such as RBF

nodes, can be studied in the future work. Similar to

(Huang et al., 2011; Huang et al., 2010), good perfor-

mance is achievedusually when the number of hidden

nodes L is large. The performances of our proposed

algorithm and OS-eELM-left are insensitive to L as

well (Figure 1). Therefore, it is convenient that L is

ﬁxed as 1000 for both algorithms. Another parameter

C which determines the positive value added to the

diagonal needs to be speciﬁed by users. A wide rage

of C {2

−24

, 2

−23

, ··· , 2

, 2

} is validated for the op-

timal testing rate.

250

500

750

1000

0.05

0.1

0.15

0.2

0.25

0.3

Testing RMS

−18

Figure 1: Testing RMSE of abalone with respect to C and

5.3 Comparison of Average Testing

Accuracy

The average testing RMSE for regression problems

and average testing rate for classiﬁcation problems

are obtained over 50 trials.

It can be observed from Table 2 and Table 3 that

for online sequential learning, the accuracy on test-

ing data is improved when a positive value is added to

the diagonal of H

H for left pseudo-inverse or HH

for right pseudo-inverse. It is also shown that the se-

quential algorithms using right or left pseudo-inverse

obtain similar testing accuracy.

5.4 Comparison of Training Time

Different from the original ELM, the number of hid-

den nodes is ﬁxed as a large value (1000) in the en-

hanced ELM. That is the reason why enhanced ELM

tends to be slower than the original ELM in both batch

and sequential learning mode.

It is not difﬁcult to ﬁnd out from (12) that the

computational complexity of the incremental learn-

ing phases for both OS-eELM-left and OS-eELM-

right are similar. Therefore, Main focus on analy-

sis of computational complexity is given to the initial

phases for both formulas. For right pseudo-inverse of

the form H

†

= H

(

+ HH

)

−1

, matrix inversion is

carried out on a N × N matrix. While it is L × L in

the case of left pseudo-inverse. Therefore, in terms of

the training speed, right pseudo-inverse is preferred

ICPRAM 2012 - International Conference on Pattern Recognition Applications and Methods

304

when the size of training data is small; otherwise left

pseudo-inverse is more appropriate. Hence, it can be

observed from Table 2 and Table 3 that OS-eELM-

right runs relatively faster than OS-eELM-left since

the size of initial training data is chosen much smaller

than the number of hidden nodes.

6 CONCLUSIONS

In this paper, an online sequential learning algorithm

for SLFNS and other regularization networks based

on the enhanced ELM is proposed, which is capa-

ble of learning data on a one-by-one basis or chunk-

by-chunk basis. Simulations on six benchmarking

datasets have shown that, by adding a positive value

to the diagonal of HH

and H

H, the generalization

performance of our proposed methods outperform the

original OS-ELM. In addition, in the simulations, OS-

eELM-right is more suitable for sequential learning

than OS-eELM-left concerning the issue of training

speed since there are less than 1,000 observations dur-

ing the initial training phase. Different hidden nodes,

such as RBF nodes, can be implemented in the future

work.

REFERENCES

Asirvadam, V. S., McLoone, S. F., and Irwin, G. W. (2002).

Parallel and separable recursive levenberg-marquardt

training algorithm. In 12th IEEE Workshop on Neu-

ral Networks for Signal Processing, pages 129–138.

IEEE.

Blake and Merz, C. J. (1998). UCI repository of machine

learning databases.

Boyd, S., Ghaoui, L. E., Feron, E., and Balakrishnan, V.

(1994). Linear Matrix Inequalities in System and Con-

trol Theory. Society for Industrial and Applied Math-

ematic.

Deng, W., Zheng, Q., and Chen, L. (2009). Proximal

support vector machine classiﬁers. In IEEE Sympo-

sium on Computational Intelligence and Data Mining

(CIDM 09), pages 389–395.

Feng, G., Huang, G.-B., Lin, Q., and Gay, R. (2009). Error

minimized extreme learning machine with growth of

hidden nodes and incremental learning. IEEE Trans-

actions on Neural Networks, 20(8):1352–1357.

Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression:

Biased estimation for nonorthogonal problems. Tech-

nometrics, 12(1):55–67.

Huang, G.-B. and Chen, L. (2007). Convex incremental ex-

treme learning machine. Neurocomputing, 70:3056–

3062.

Huang, G.-B. and Chen, L. (2008). Enhanced random

search based incremental extreme learning machine.

Neurocomputing, 71:3460–3468.

Huang, G.-B., Chen, L., and Siew, C.-K. (2006a). Universal

approximation using incremental constructive feed-

forward networks with random hidden nodes. IEEE

Transactions on Neural Networks, 17(4):879–892.

Huang, G.-B., Ding, X., and Zhou, H. (2010). Optimization

method based extreme learning machine for classiﬁ-

cation. Neurocomputing, 74:155–163.

Huang, G.-B., Saratchandran, P., and Sundararajan, N.

(2004). An efﬁcient sequential learning algorithm for

growing and pruning rbf (gap-rbf) networks. IEEE

Transactions on Systems, Man, and Cybernetics, Part

B: Cybernetics, 34.

Huang, G.-B., Saratchandran, P., and Sundararajan, N.

(2005). A generalized growing and pruning rbf (ggap-

rbf) neural network for function approximation. IEEE

Transactions on Neural Networks, 16(1):57–67.

Huang, G.-B., Zhou, H., Ding, X., and Zhang, R. (2011).

Extreme learning machine for regression and multi-

class classiﬁcation. (in press) IEEE Transactions on

Systems, Man, and Cybernetics.

Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. (2006b). Ex-

treme learning machine: Theory and applications.

Neurocomputing, 70(1-3):489 – 501.

Liang, N.-Y., Huang, G.-B., Saratchandran, P., and Sun-

dararajan, N. (2006). A fast and accurate online se-

quential learning algorithm for feedforward networks.

IEEE Transactions on Neural Networks, 17(6):1411–

1423.

Ngia, L. S., Sj¨oberg, J., and Viberg, M. (1998). Adap-

tive neural nets ﬁlter using a recursive levenberg-

marquardt search direction. In the 32nd Asilomar

Conference on Signals, Systems and Computers, CA,

USA.

Rao, C. R. and Mitra, S. K. (1971). Generalized Inverse

of Matrices and its Applications. John Wiley & Sons,

Inc, New York.

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986).

Learning representations by back-propagation errors.

Nature, 323:533–536.

Serre, D. (2002). Matrices: Theory and Applications.

Springer-Verlag New York, Inc.

Toh, K.-A. (2008). Deterministic neural classiﬁcation. Neu-

ral Computation, 20(6):1565–1595.

ONLINE SEQUENTIAL LEARNING BASED ON ENHANCED EXTREME LEARNING MACHINE USING LEFT OR

RIGHT PSEUDO-INVERSE

305