2.1.2 Hidden Neurons
The single, hidden layer of M neurons transforms the
input data into a different representation, called the
feature space. The most popular are “projection-based
neurons”. Each n-dimensional input is projected by
the input layer weights
=
[
,
…
,
]
,=
1,.. and the bias
into the k-th neuron input and
next a nonlinear transformation ℎ
, called activation
function (AF), is applied to obtain the neuron output.
The matrix form of the hidden layer performance on
the batch of N samples is represented by a ×
matrix:
=
ℎ
(
+
)
⋯ℎ
(
+
)
⋮⋱⋮
ℎ
(
+
)
⋯ℎ
(
+
)
. (3)
Although it is not obligatory that the hidden layer
must contain only one kind of neurons, it is usually
the case. Any piecewise differentiable function may
be used as activation function: sigmoid, hyperbolic
tangent, threshold are among the most popular.
Another type of neurons used in ELM is “distance-
based” neurons, such as Radial Basis Functions
(RBF) or multi-quadratic functions. Each neuron uses
the distance from the centroid (represented by
) as
the input to the nonlinear transformation (Guang-Bin
Huang and Chee-Kheong Siew, 2004). The formal
representation of ELM also generates the matrix
similar to (3):
=
ℎ
(‖
−
‖
,
)
⋯ℎ
(‖
−
‖
,
)
⋮⋱⋮
ℎ
(‖
−
‖
,
)
⋯ℎ
(‖
−
‖
,
)
.(4)
The acceptable number of neurons may be found
using validation data, Leave-One-Out validation
procedure, random adding or removing the neurons
(Akusok et al., 2015) or ranking the neurons (Miche
et al., 2010), (Feng, Lan, Zhang, and Qian, 2015),
(Miche, van Heeswijk, Bas, Simula, and Lendasse,
2011).
The weights and the biases of the hidden neurons are
generated on random. The uniform distributions in
[-1,1] for the weights and in [0, 1] for the biases is the
most popular choice. If the data are normalized to
have the zero mean and the unit variance the normal
distribution may be used to generate the neuron
parameters (Akusok et al., 2015).
2.1.3 Output Weights
The output of an ELM is obtained by applying the
output weights to the hidden neurons, therefore the
outputs for all samples in the batch are:
=. (5)
The output weights are found by minimizing the
approximation error:
=
‖
−
‖
. (6)
The optimal solution is:
=
, (7)
where
is the Moore–Penrose generalized inverse
of matrix .
The matrix possesses N rows and M columns. If the
number of training samples N is bigger than the
number of hidden neurons M, ≥, and the matrix
H has full column rank, then
=
(
)
. (8)
If the number of training samples N is smaller than
the number of hidden neurons M, <, and the
matrix H has full row-rank, then
=
(
)
, (9)
but the last case is impractical in modeling.
2.2 Numerical Aspects of ELM
The calculation of the output weights is the most
sensitive stage of the ELM algorithm and may be a
reason for various numerical difficulties. Each of two
ways may be selected to calculate
: specialized
algorithms for calculation of pseudoinverse (8), or
simply, calculation of following matrices:
Λ=
,Ω=,
=Λ
Ω. (10)
The computational complexity of both approaches is
similar, but the second one is characterized by smaller
memory requirements (Akusok et al., 2015). For the
both approaches working with moderate condition
number of
is crucial for the algorithm stability
and is necessary to get moderate values of
. Huge
output weights may reinforce the round-off errors of
arithmetical operations performed by the network and
make the application impossible.
The well-known therapy for numerical problems in
ELM caused by ill-conditioned matrix H is Tikhonov
regularization of the least-square problem (Huang et
al., 2015), (Akusok et al., 2015). Instead of
minimizing (6), the weighted problem is considered,
with the performance index: