Multi-layer Extreme Learning Machine-based Autoencoder for
Hyperspectral Image Classification
Muhammad Ahmad
1,2,
, Adil Mehmood Khan
1
, Manuel Mazzara
1
and Salvatore Distefano
2
1
Innopolis University, Innopolis, Russia
2
University of Messina, Messina, Italy
Keywords:
Extreme Learning Machine (ELM), Deep Neural Networks (DNN), Auto Encoder (AE), Hyperspectral Image
Classification.
Abstract:
Hyperspectral imaging (HSI) has attracted the formidable interest of the scientific community and has been
applied to an increasing number of real-life applications to automatically extract the meaningful information
from the corresponding high dimensional datasets. However, traditional autoencoders (AE) and restricted
Boltzmann machines are computationally expensive and do not perform well due to the Hughes phenomenon
which is observed in HSI since the r at io of the labeled training pixels on the number of bands is usually
quite small. To overcome such problems, this paper exploits a multi-layer extreme learning machine-based
autoencoder (MLELM-AE) for HSI classification. MLELM-AE learns feature representations by adopting
a singular value decomposition and is used as basic building block for learning machine-based autoencoder
(MLELM-AE). MLELM-AE method not only maintains the fast speed of traditional ELM but also greatly
improves the performance of HSI classification. The experimental results demonstrate the effectiveness of
MLELM-AE on several well-known HSI dataset.
1 INTRODUCTION
Hyperspectral images (HSI) provides a unique way
for characterizing objects of interest in ge ographi-
cal scenes with very rich spatial-spectral informa-
tion contained in a 3-D hypercube (Ahmad et al.,
2016). However, classification of such high dimen-
sional hyperspectral data is still a challenging ta sk,
especially in the case the ratio between the number of
available labeled training samples and the number of
spectral dimensions (usually large) is small, which is
commonly known as Hughes phenomenon (Hughes,
1968).
To cope with the issues due to the h igh number of
dimensions, a number of feature extraction, selection,
and classification m e thods h ave been proposed in the
recent years (Ren et al., 2014; Ahmad et al., 2011; Liu
et al. , 2 018). The se methods have yielded q uite good
outcomes. However, their performa nce can be further
improved by addressing two main issues: 1) inaccu-
rate classification in the case of the Hughes pheno-
menon (Ahma d et al., 2018); 2) compar atively low
efficiency for processing high dimensional HSI data
(Ahmad et al., 2017a).
Extreme learning m a chine (ELM), a s a single hid-
den layer feed -forward neural network, is an effective
and fast machine learning method and has received
a remarkab le attention due to its high ge neralization
performance (Ding e t al., 201 5). In ELM, the hidd e n
layer parameters need to not be tu ned once the num-
ber of hidden layer nodes is lear ned. Moreover, the
bias and weights between the hidden and input layers
are random ly assigned without taking into account the
training samples and applications (Zhou et al., 201 5).
Due to its generalization capabilities, ELM has
been exten sively stu died for HSI classification pro-
blems, for instance in (Arguello and Heras, 2015;
Shen et al., 2016), extended morphological p rofiles
and bilateral filterin g based meth ods were used for
feature extraction and ELM was u sed as a base clas-
sifier. In (Ch en et al., 2014; Dora et al., 2014),
Gabor filter and watershed-based methods were em-
ployed f or feature extraction and ELM was used as a
final classification me thod. Regardless of comp utati-
onal complexity and other issues, these methods have
achieved a remarkable performance for HSI classi-
fication. However, th ese me thods igno re one very
important aspect in ELM: the randomly rendered in-
put bias and weights may cause ill-posed problems.
Based on this phenomenon, to handle such a pro-
Ahmad, M., Khan, A., Mazzara, M. and Distefano, S.
Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classification.
DOI: 10.5220/0007258000750082
In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 75-82
ISBN: 978-989-758-354-4
Copyright
c
2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
75
blem effectively and efficiently we intend to exploit
the multi-layer extreme learning machine-based a u-
toencod er (MLELM-AE) meth od for HSI classifica-
tion wh en we do not need to extract the f eatures ex-
plicitly, as mentioned in (Kasun et al., 2013), for di-
git classification problems. To the best of our know-
ledge, it is a first of its kind of work for HSI classifica-
tion. A similar criterion has been explored in the past
(Kasun e t al., 2013). However, in our work , instead
of using the pipeline for traditional image classifica-
tion or recognition, we implemented and tested it on
hypersp ectral image classification and segmentation
problem which is mo re complected then the traditio-
nal image classification.
The remainder of the paper is structured as fol-
lows. Section 2 presents the theoretical aspects of ex-
treme learnin g machine pipelin e followed by a theo-
retical explanation of th e extreme learning machine
learning based a utoencoder. Section 3 discusses ex-
perimental setups and metrics. Section 4 discussed
the dataset, settin gs, and results. Finally, Section 5
summarizes the contributions and futu re research di-
rections.
2 EXTREME LE AR N IN G
MACHINE
In ELM, th e bias and weight vecto rs between the hid-
den and input layer are ran domly assigned, while the
net values are obtained by the learn ing process. Once
the initial values are preserved, the hidden layer out-
put matrix p e rsist unaltered in the learning process.
Let as assume, X = (x
1
,x
2
,x
3
,· · · ,x
N
) R
d×N
be the tra ining data which has N number of pixels
and each pixel has d-dimensional feature . Let Y =
(y
1
,y
2
,y
3
,· · · ,y
M
) R
M× N
be a matrix representing
the class labels of th e training samples in which M
is the number of classes in HSI data . Thus, the
ELM model with L hidden neurons and the activation
function H(x) can be expressed as;
N
j=1
β
j
H(W
T
J
x
i
+ b
j
) = y
i
;i = 1,2,3,· · · N (1)
where H(W
T
J
x
i
+ b
j
) represents the output of the j
th
hidden neuron with respect to the input x
i
and β
j
, W
j
,
and b
j
represents the weight vector between hidden
layer and output layer, and weight and bias between
hidden and input layer, respectively. The above ex-
pression can simply be written as;
H
T
β = Y
T
(2)
where
β = [β
1
,β
2
,β
3
,· · · ,β
M
]
L×M
(3)
H = [H(x
1
),H(x
2
),H(x
3
),· · · H(x
n
)]
L×N
(4)
and
H(x
i
) = [H
1
(x
i
),H
2
(x
i
),H
3
(x
i
),· · · , H
L
(x
i
)]
T
L×1
(5)
Finally, β can be computed as;
β (H
T
)
Y
T
(6)
where (.)
is the Moore Penrose generalized inverse
of a matrix.
The main goal of multi-layer extreme learning
machine-based autoencoder is to learn a useful fea-
ture representation in three different folds similar to
traditional autoencoders (Ahmad et al., 2019). Na-
mely, compressed representation - man ifest input fe-
atures form h igh dimensional hyperspectral space to
a lower dimensio nal feature space, sparse representa-
tion - low d imensional input f eature space to higher
dimensional hyperspectral feature space, a nd finally,
equal input/output dimensional representation - inter-
pret input space dimensions equal to feature space di-
mension.
According to (Kasun et al., 2013; Huang et al.,
2006), extreme learning m achine is a universal ap-
proxim ator, therefore, MLELM-AE is also a univer-
sal approximator. In ML ELM-AE the orthogonal
random biases and weights of the hidden nodes un-
dertaken the input samples to equal dimensio nal sp a ce
as shown in (Kasu n et al., 2013; Huang et al., 2006;
Johnson a nd Lindenstrauss, 1984) and in below equa-
tion similar to the equation (1),
h g(a × x + b) (7)
where a
T
× a = I in which a = [a
1
,a
2
,· · · a
L
], and
b
T
× b = 1 in which b = [b
1
,b
2
,· · · ,b
L
] are the or t-
hogonal random weights and bias between input and
hidden nod es, respectively. T herefore, as shown in
(Kasun et al., 2013; Huang et al., 2006), the output
weights for compressed and sparse MLELM-AE re-
presentation can be obtained by incorp orating the re-
gularization term to enhance the generalization p e r-
formance and robustness
β
I
C
+ H
T
H
1
H
T
X (8)
where C is the regularization term, X =
[x
1
,x
2
,x
3
,· · · ,x
N
] are input and output data samples,
and H = [h
1
,h
2
,h
3
,· · · ,h
N
] are th e hidden layer
outputs of MLELM-AE. The outp ut weights can also
be computed as;
β H
1
X (9)
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
76
where β
T
β = I to make the input and output equal.
Therefore, the singular value decompo sition of re-
gularized output weights for compressed and sparse
MLELM-AE representation ca n be compute d as in
(Kasun et al., 2013) , i.e.
Hβ
N
i=1
u
i
v
2
i
v
2
i
+C
u
T
i
X (10)
where v represents the sin gular values of H an d u re-
presents the eigenvectors of HH
T
. Since, H is the
projected feature space squashed via a liner or nonli-
near (sigmoid or any appropriate) activation function,
we speculate that the output weights β will be lear-
ning to represent the features of the inpu t space via
singular value decomposition.
Furthermore, if the numb er of hidden nodes L
k
in
k
th
hidden layer is equal to the number of hidden no-
des L
k1
in the (k 1)
th
hidden layer in which g is
chosen as linear activation function other way around
g will be chosen as nonlinea r piece-wise activation
function. This way
H
k
g((β
k
)
T
H
k1
) (11)
where H
k
is the k
th
hidden layer output matrix. For
better in tuition, the input after x can be identified as
the 0
th
hidden layer where k = 0. Finally, the out-
put of the connections among the last hidden layer
and the output node t is over-analytically computed
by employing regularized least squar es, where t is the
output data.
3 EXPERIMENTAL METRIC S
In this section, the performanc e of MLELM-AE is
evaluated using seven different well-known publicly
available AVIRIS, ROSIS, and NASA EO-1 satellite
Hyperion sensor based hyperspectral datasets. More
informa tion about these datasets can be foun d in (Liu
et al., 2018; Ahmad et al., 2018; Ahmad et al., 2017b ;
Li et al., 2013; He et al., 2018; Datasets, ).
Confusion matrix is ge nerally used to evaluate the
performance o f HSI classification in terms of over-
all, average accuracy, an d kappa κ coefficient. In this
work, the overall accuracy for hyperspectral image
classification is computed by the following formula:
OA =
N
i=1
x
ii
M
j=1
N
i=1
x
i j
(12)
From the above equation, it can be seen that the
magnitude of the overall accuracy is only affected by
the diagonal elements. It is more likely affected by
classes that contain m ore eleme nts so it is not suf-
ficient to comprehensively evaluate the classification
accuracy of all classes. A more com prehensive in-
dex of classification accuracy evaluation is the κ coef-
ficient utilizing all samples of the confusion matrix
thus reflecting the consistency between classification
results and gro und truth. The κ coefficient is evalua-
ted by the fo rmula (Ahmad et al., 2017b):
κ =
N
i
a
i
i
b
i
d
i
N
2
i
b
i
d
i
(13)
where N is the total number of samples (pixels in HSI
cube), a
i
is the number of correctly predicted samples
in the given c la ss,
i
a
i
is the sum of the number of
correctly predicted samples, b
i
is the actual number
of samples belonging to the given class and d
i
is the
number of samples that have been correc tly pre dicted
into the given class (Ahmad et al., 2018).
Furthermore, to evaluate the significance of
MLELM-AE, several statistical tests are conducted
e.g, F1-score, precision, and recall ra te . The precision
maps the ratio of correctly identified positive samples
to the total predicted positive samples. High precision
value indicates lesser false positive rates referring to
the model ability to correctly identify the true posi-
tive samples. Whereas, recall a ccounts the ratio of
correctly predicted positive samples from the entire
positive samples as true. As similar to precision, the
higher recall rate the be tter the model is.
Likewise, F1-score is a weighted average of pre-
cision and recall rates. Therefore, F1-score takes both
false negatives and false positives into account. F1-
score is more useful then the other accuracy measures,
but intuitively n ot as easy to unde rstand as accur a cy,
particularly when we have unbalance d class distribu-
tion. Several accuracy measures works well if false
negatives and positives have similar cost, if in case
these are different, then better to consider both preci-
sion and recall rate to evaluate the model.
In this section, we will also evaluate the relevant
tuning parameters which include the number of neu-
rons in the hidden layers, the total number of layers,
and the appropriate value for regularization term C. In
our experiments, the regulariza tion term is automa-
tically tuned by the 5-fold- cross-validation process.
The number of hidden layer neurons is systematically
set fro m the range [Total Number of Training Samples
- Total Number of Testing Samples], and the numb er
of layers is heuristically set in the range [1 5] for
cross-validation process to find the optimu m value of
regularization term from the interval [1e
1
1e
14
].
Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classification
77
4 EXPERIMENTAL RESULTS
AND DISC U SSION
In this section , we will briefly discuss the experimen-
tal results acquired by MLELM-AE pipeline on se-
ven different hyperspectral datasets. Prior to the ex-
periments, we performed the nec essary normalization
between [0 1]. All the experiments have been car-
ried o ut on a clu ster using MATLAB (2017a) on In-
tel Core (TM) i7-7700K CPU 2.40GHz, 1962 MHz,
Ubuntu 16.01.5 LTS, Cu de complation to ols, realease
7.5, V7.5.17 with 65GB RAM.
The presented experiments shows the a ccuracy
analysis in terms of overall accuracy, average accu-
racy, and κ coefficient. Figures 1-7 shows the groun d
truth maps for original test samples along with the
prediction of these samples in geographical maps.
Furthermore, these Figures also presented the
average, overall, and κ accuracy in multiclass form
along with the mean squared error of MLELM-AE
model in 10-fo ld-cross-validation alo ng with the trai-
ning and testing time for each dataset. The training
and test time is significantly less than the trad itional
back propagation based d e ep neural ne tworks. Fu rt-
hermore, the plo ts shows higher generalization per-
formance with less amount of training samples.
To highlig ht the class-based classification r e sults,
Tables 1-7 r eport the κ coefficient fo r each individual
class, providing insights on the nu mber of training
versus estimated labels used in our experiments and
thus demonstrating clear advantages of using limited
samples for th e learning MLELM-AE model. In most
cases, the proposed pipeline outperforms existing so-
lutions.
Experiments with Salinas Dataset
The Salinas dataset consists of 224 spectral bands
with a high spatial resolution of 3.7 m. Salinas full
scene was collected by AVIRIS sensor over Salinas
Valley California.
In Salinas scene some bands we re water absor p-
tion and removed prior to the analyses. The removed
bands are 108 112, 154 167 and 224. Th e full Sa-
linas scene is covered with 512 × 217 pixels per band
and co ntains vegetables, bare, soils an d vineyard field.
Salinas ground truth contains 16 classes.
A sub -scene of Salinas dataset named Salinas-A
consists of 86 × 83 samples per band and 6 classes.
The Salinas-A samples are located in the full Salinas
scene at 591 6 76 an d 158 240. Dataset files and
description can be o btained from (Datasets, ).
The experimental results are shown in Tables 1
and 2 an d Figures 1 and 2. From results, it can be cle-
arly seen that MLELM-AE pipe line greatly improved
the classification accuracies for Salinas and Salinas-A
datasets with acceptable generalization performance.
Furthermore, th e detailed accuracy and time taken to
train and test the model is provided in the caption re-
spective Figures. In all these experiments the training
size is set as 1% samples from Salinas and Salinas-A
datasets, respectively.
Table 1: Classification accuracy (κ) analysis and statistical
measures for Salinas-A Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Brocoli Green Weeds 1 (8, 375) 0.9316±0.0516 0.9948 0.9999 0.9973
Corn Senesced Green Weeds (27, 1289) 0.9695±0.0144 0.9795 0.9999 0.9896
Lettuce Romaine 4wk (13, 591) 0.9773±0.0212 0.9934 0.9772 0.9852
Lettuce Romaine 5wk (31, 1464) 0.9975±0.0022 0.9999 0.9973 0.9987
Lettuce Romaine 6wk (14, 647) 0.9947±0.0026 0.9969 0.9763 0.9865
Lettuce Romaine 7wk (16, 767) 0.9757±0.0125 0.9796 0.9783 0.9789
Table 2: Classification accuracy (κ) analysis and statistical
measures for Salinas Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Brocoli Green Weeds 1 (61, 2009) 0.9982±0.0003 0.9979 1.0000 0.9989
Brocoli Green Weeds 2 (112, 3726) 0.9968±0.0004 0.9958 0.9991 0.9975
Fallow (60, 1976) 0.8404±0.0232 0.8721 0.9631 0.9153
Fallow Rough Plow (42, 1394) 0.9835±0.0033 0.9859 0.9925 0.9892
Fallow Smooth (81, 2678) 0.9883±0.0029 0.9795 0.9002 0.9382
Stubble (119, 3959) 0.9971±0.0005 0.9963 0.9994 0.9979
Celery (108, 3579) 0.9963±0.0007 0.9959 0.9945 0.9952
Grapes Untrained (339, 11271) 0.8891±0.0068 0.8795 0.7884 0.8314
Soil Vinyard Develop (187, 6203) 0.9909±0.0026 0.9960 0.9848 0.9904
Corn Senesced Green Weeds (99, 3278) 0.9348±0.0086 0.9465 0.9552 0.9508
Lettuce Romaine 4wk (33, 1068) 0.9539±0.0081 0.9748 0.9465 0.9605
Lettuce Romaine 5wk (58, 1927) 0.9999±0.0001 0.9994 0.9623 0.9806
Lettuce Romaine 6wk (28, 916) 0.9785±0.0018 0.9752 0.9569 0.9659
Lettuce Romaine 7wk (33, 1070) 0.9284±0.0083 0.9402 0.9587 0.9493
Vinyard Untrained (219, 7268) 0.6238±0.0105 0.6300 0.7857 0.6994
Vinyard Vertical Trellis (55, 1807) 0.9840±0.0020 0.9857 0.9976 0.9917
Test
Test Prdict
Figure 1: True Testing Maps and predicted Test Maps
for Salinas-A dataset with 10-fold-cross-validation-based
Average Accuracy = 0.9744 ± 0.0093, Overall Accuracy =
0.9797± 0.0059, κ = 0.9746±0.0073, Mean-squared Error
= 0.2561 ± 0.0478, and Training Time = 0.0918 ± 0.0067
and Test Time = 0.2539 ± 0.0141.
Experiments with Kennedy Space Center
Dataset
The NASA AVIRIS instrument acquired data over
the Kennedy Space Center (KSC) Florida o n March
23, 1996. AVIRIS acquired data in 224 bands of
10 nm width with center wavelengths in the range
400 2500 nm, from an altitude of approxima te ly
20 km with a spatial resolution of 18 m. After remo-
ving water absorption and low SNR bands, 176 bands
were used for the analysis.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
78
Test Test Prdict
Figure 2: True Testing Maps and predicted Test Maps
for Salinas dataset with 10-fold-cross-validation-based
Average Accuracy = 0.9428 ± 0.0018, Overall Accuracy =
0.9106± 0.0013, κ = 0.9002±0.0015, Mean-squared Error
= 1.9138± 0.0087, and Training Time = 54.2526 ± 2.0612
and Test Time = 105.7566± 3.6922.
Training data were selected using land cover maps
derived from colo r infrared photography provided by
the KSC and Lan dsat Thematic Mapper (TM) ima-
gery. The vegetation classification scheme was d eve-
loped b y KSC perso nnel in an effort to define functio-
nal types that are discern-able at the spatial re solution
of Landsat an d this AVIRIS dataset.
Discrimination o f land cover for this environment
is difficult due to the similarity of spe ctral sig natures
for certain vegetation types. For classification purpo-
ses, 13 classes representing the various lan d cover ty-
pes th a t occur in this environmen t were defined for the
site. Dataset files a nd description can be taken from
(Datasets, ). The experimental r esults are shown in
Table 3 and Figure 3. From r e sults, one can conclude
that MLELM-AE greatly improved the classification
accuracies for more complicated Kennedy space cen-
ter AVIRIS sensor dataset with enhanced ge neraliza-
tion capabilities. Moreover, the detailed accuracy and
time taken to train and test the model is provided in
the caption.
Table 3: Classification accuracy (κ) analysis and statistical
measures for Kennedy Space Center Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Scrub (229, 761) 0.9895±0.0018 0.9906 0.7877 0.8776
Willow Swamp (73, 243) 0.9541±0.0169 0.8941 0.7958 0.8421
CP/Oak (77, 252) 0.9240±0.0225 0.9162 0.6667 0.7718
CP hammock (76, 256) 0.4903±0.0388 0.5000 0.8073 0.6175
Slash Pine (49, 161) 0.6384±0.0253 0.6250 0.8434 0.7179
Oak/Broadleaf (69, 229) 0.2444±0.0192 0.2875 0.7541 0.4163
Hardwood Swamp (32, 105) 0.4603±0.0658 0.6575 0.7500 0.7007
Graminoid Marsh (130, 431) 0.8545±0.0133 0.8638 0.9629 0.9107
Spartina Marsh (156, 520) 0.9841±0.0039 0.9890 0.8933 0.9387
Cattail Marsh (122, 404) 0.9439±0.0089 0.9362 0.9778 0.9565
Salt Marsh (126, 419) 0.9785±0.0044 0.9659 0.9861 0.9759
Mud Flats (151, 503) 0.8773±0.0179 0.9233 0.9207 0.9219
Water (279, 927) 0.9852±0.0043 0.9784 0.9969 0.9875
Test
Test Prdict
Figure 3: True Testing Maps and predicted Test Maps for
Kennedy Space Center (KSC) dataset wi th 10-fold-cross-
validation-based Average Accuracy = 0.7942 ± 0.0059,
Overall Accuracy = 0.8786± 0.0034, κ = 0.8642± 0.0038,
Mean-squared Error = 1.4954 ± 0.0374, and Training Time
= 14.2984± 0.5296 and Test Time = 0.7258 ± 0.0662.
Experiments with I ndian Pines Dataset
Indian Pines dataset is gathered by AVIRIS sensor
over the Indian Pines test site in north -western Indi-
ana and consists of 1 45 × 145 pixels and 224 b ands in
the wavelength range 0.4 2.5 × 10
6
meters.
Indian Pines dataset contains 2/3 agriculture, and
1/3 f orest or other natural pere nnial vegetation. There
are two major dual lane highways, a rail lin e, as well
as some low density housing, other build structure s,
and small roads. Since Indian Pine s dataset was taken
in June some of the crops present, corn, soybeans, are
in early stages of growth with less then 5% coverage.
The grou nd truth available is distinguished into six-
teen classes not all mutually exclusive.
We have also reduced the nu mber of bands to 200
by r e moving bands covering the region of water ab-
sorption. The removed bands are 104-108, 150-163,
220. Dataset files and description can be obtained
from ( D a ta sets, ). The expe rimental results are shown
in Table 4 and Figure 4. From results, one can con-
clude that MLELM-AE greatly impr oved the classifi-
cation ac c uracies with enhanced generalization cap a -
bilities. Furthermore, the detailed accuracy analysis
and time taken to train and test the model is provided
in the caption.
Table 4: Classification accuracy (κ) analysis and statistical
measures for Indian Pines Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Alfalfa (10, 46) 0.4556±0.0736 0.6944 0.8065 0.7464
Corn-notill (286, 1428) 0.8117±0.0070 0.8284 0.7472 0.7857
Corn-mintill (166, 830) 0.6048±0.0215 0.5768 0.7539 0.6536
Corn (48, 237) 0.3862±0.0385 0.3492 0.7952 0.4853
Grass-pasture (97, 483) 0.8943±0.0159 0.8834 0.9419 0.9118
Grass-trees (146, 730) 0.9829±0.0046 0.9846 0.9055 0.9434
Grass-pasture-mowed (6, 28) 0.5409±0.0469 0.5455 1.0000 0.7059
Hay-windrowed (96, 478) 0.9927±0.0038 0.9921 0.9595 0.9756
Oats (4, 20) 0.1625±0.0688 0.2500 0.8000 0.3809
Soybean-notill (195, 972) 0.6779±0.0137 0.7091 0.7527 0.7303
Soybean-mintill (491, 2455) 0.8457±0.0053 0.8432 0.7520 0.7951
Soybean-clean (119, 593) 0.7569±0.0172 0.8101 0.8571 0.8329
Wheat (41, 205) 0.9866±0.0043 0.9756 0.9639 0.9697
Woods (253, 1265) 0.9641±0.0047 0.9664 0.9297 0.9477
Buildings-Grass-Trees-Drives (78, 386) 0.6042±0.0159 0.6266 0.8143 0.7083
Stone-Steel-Towers (19, 93) 0.7284±0.0421 0.8108 1.0000 0.8956
Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classification
79
Test
Test Prdict
Figure 4: True Testing Maps and predicted Test Maps
for Indian Pines dataset with 10-fold-cross-validation-
based κ = 0.7839 ± 0.0028, Average Accuracy = 0.7122 ±
0.0078, Overall Accuracy = 0.8122 ± 0.0024, Mean-
squared Error = 2.6312 ± 0.0359, and Training Time =
38.8711± 0.7759 and Test Time = 3.2365 ± 0.1330.
Experiments with Pavia University and
Pavia Center Datasets
The Pavia University (PU) dataset is acquired by the
ROSIS optical sensor during a flight campaign over
Pavia in northern Italy with geometric resolution of
1.3m. PU data consists of 102 spectral bands with
1096 × 1096 samples per band.
Some of the samples in PU dataset contain s no in-
formation and have to be discarded prior to the analy-
sis. PU scene ground-truths identified 9 classes. Da-
taset files and description can be obtained from (Da-
tasets, ).
The experimental results are shown in Tables 5
and 6 an d Figures 5 and 6. From results, one can con-
clude that MLELM-AE greatly improved the classi-
fication accuracies with enhanced generalization ca-
pabilities for more complicated ROSIS sen sor ba-
sed datasets. Evaluating ROSIS sensor datasets is
more ch allenging classification problem dominated
by com plex u rban classes and nested r egio ns then
AVIRIS. The detailed accuracy analysis in terms of
average, overall, and kappa accuracies along with the
time taken to train and test the model is provided in
the caption.
Table 5: Classification accuracy (κ) analysis and statistical
measures for Pavia U niversity Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Asphalt (1285, 6631) 0.8105±0.0080 0.8091 0.8916 0.8485
Meadows (1980, 18649) 0.9725±0.0016 0.9694 0.8049 0.8795
Gravel (93, 2099) 0.6303±0.0133 0.6406 0.7374 0.6856
Trees (81, 3064) 0.8115±0.0048 0.8125 0.8199 0.8162
Painted metal sheets (198, 1345) 0.9947±0.0011 0.9926 0.9917 0.9923
Bare Soil (278, 5029) 0.2468±0.0078 0.2256 0.7964 0.3516
Bitumen (219, 1330) 0.6607±0.0116 0.6859 0.8857 0.7731
Self-Blocking Bricks (228, 3682) 0.8296±0.0096 0.8252 0.6421 0.7222
Shadows (86, 947) 0.8599±0.0140 0.8885 0.9299 0.9088
Test
Test Prdict
Figure 5: True Testing Maps and predicted Test Maps
for Pavia University dataset w ith 10-fold-cross-validation-
based Overall Accuracy = 0.8099± 0.0014, Average Accu-
racy = 0.7574 ± 0.0018, κ = 0.7386 ± 0.0019, Mean-
squared Error = 1.8703 ± 0.0100, and Training Time =
322.2472± 2.6395 and Test Time = 70.9769± 1.2438.
Table 6: Classification accuracy (κ) analysis and statistical
measures for Pavia Center Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Water (25, 824) 0.9996±0.0001 0.9997 0.9976 0.9986
Trees (24, 820) 0.9473±0.0055 0.9499 0.8319 0.8870
Asphalt (23, 816) 0.5989±0.0211 0.5849 0.7904 0.6723
Self Blocking Bricks (21, 808) 0.6389±0.0175 0.6820 0.6612 0.6715
Bitumen (21, 808) 0.8887±0.0119 0.8727 0.8287 0.8501
Tiles (38, 1260) 0.8173±0.0051 0.8071 0.9364 0.8669
Shadows (15, 476) 0.8621±0.0091 0.8877 0.9412 0.9136
Meadows (25, 824) 0.9957±0.0002 0.9955 0.9778 0.9866
Bare Soil (24, 820) 0.8518±0.0123 0.8739 0.8519 0.8628
Figure 6: True Testing Maps and predicted Test Maps
for Pavia Center dataset with 10-fold-cross-validation-
based κ = 0.9360 ± 0.0008, Average Accuracy = 0.8445 ±
0.0027, Overall Accuracy = 0.9549 ± 0.0006, Mean-
squared Error = 0.5067 ± 0.0075, and Training Time =
355.3899± 3.6140 and Test Time = 410.2014± 7.97897.
Experiments with Botswana Dataset
The NASA EO-1 Satellite acquired a sequence of data
over the Okavango Delta, Botswana in 2001-2004.
The Hyperion sensor on EO-1 acquired data at 30 m,
pixels resolution over a 7.7 km strip in 242 bands co-
vering the 400 2500 nm po rtion of the spe ctrum in
10 nm windows.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
80
Preprocessing of the data was performed by the
UT center for space research to mitigate the effects of
bad detectors, inter-detector mis-calibration , and in-
termittent anoma lies. Uncalib rated and noisy bands
that cover water absorption features were removed,
and the remaining 145 bands were included as candi-
date features.
The removed features a re 1 0-55, 82 -97, 102-119,
134-164, and 187-220. The d ata analyzed in this
study have been acquired on May 31, 2001, and con-
sist of observations form 14 identified classes repre-
senting the land cover types in seasonal swamps, oc-
casional swamps, and drier woodlands located in the
distal portion of the Delta.Dataset files and descrip-
tion can be obtained from (Datasets, ).
The exp erimental results are shown in Table 7
and Figure 7. From results, one can conclude
that MLELM-AE greatly improved the classification
accuracies with enhanced generalization capabilities
for more complicated Hyperion sensor on EO-1 sen -
sor based datasets. Evaluating Hyperion sensor on
EO-1 sensor datasets is more challenging classifica-
tion problem dominated b y complex urban classes
and nested regions then AVIRIS and ROSIS. The de-
tailed accuracy analysis in terms of average, overall,
and kappa accuracies along with th e time taken to
train and test the model is provided in the caption.
Table 7: Classification accuracy (κ) analysis and statistical
measures for Botswana Dataset.
Class Names (Train, Test) κ Recall Precision F1-Score
Water (81, 270) 0.9989±0.0021 1.000 1.0000 1.0000
Hippo Grass (31, 101) 0.9757±0.0187 1.000 0.9859 0.9929
Floodplain Grasses1 (76, 251) 0.9897±0.0057 0.9943 0.9943 0.994285714
Floodplain Grasses1 (65, 215) 0.9953±0.0039 1.0000 0.9494 0.9740
Reeds1 (81, 269) 0.8787±0.0132 0.8829 0.8737 0.8783
Riparian (81, 269) 0.7011±0.0204 0.7128 0.8323 0.7679
Firescar 2 (78, 259) 0.9912±0.0040 0.9945 1.0000 0.9972
Island Interior (61, 203) 0.9796±0.0067 0.9859 0.9929 0.9894
Acacia Woodlands (95, 314) 0.9347±0.0135 0.9361 0.9031 0.9193
Acacia Shrublands (75, 248) 0.9006±0.0172 0.8671 0.9554 0.9091
Acacia Grasslands (92, 305) 0.9577±0.0124 0.9718 0.9039 0.9367
Short Mopane (55, 181) 0.9190±0.0187 0.9524 0.9302 0.9412
Mixed Mopane (81, 268) 0.9337±0.0145 0.9412 0.9072 0.9239
Exposed Soils (29, 95) 0.9682±0.0129 1.0000 1.0000 1.0000
In this section, we performed a set of experiments
to evaluate MLELM-AE using both ROSIS, AVIRIS,
and NASA EO-1 Satellite Hyperion EO-1 sen sors da-
tasets. Evaluating ROSIS and Hyperion sensors data -
sets are more c hallenging c la ssification p roblems do-
minated by complex urban classes and nested regions
then AVIRIS. Figures 1-7 and Tables 1-7 shows the
overall, average, and kap pa (κ) accuracies along with
the trainin g and test time taken as a function of the
number of labeled samples. The Figures 1-7 and Ta-
bles 1-7 are generated based on only selected sam-
ples in contrast to the entire population which reveal
clear advantages of using fewer labeled samples for
MLELM-AE pipeline .
Test Test Prdict
Figure 7: True Testing Maps and predicted Test Maps for
Botswana datasets with 10-fold-cross-validation-based κ =
0.9268 ± 0.0039, Average Accuracy = 0.9374 ± 0.0041,
Overall Accuracy = 0.9325 ± 0.0036, Mean-squared Error
= 0.8309 ± 0.0413, and Training Time = 2.5815 ± 0.0895
and Test Time = 0.1872 ± 0.0182.
5 CONCLUSIONS AND FUTURE
WORK
In this work we implem e nted a framework for hy -
perspectral image c la ssification in computational ef-
ficient fashion using extreme learnin g machine-based
autoencoder (MLELM-AE ). MLELM-AE is a spe-
cial case of traditional ELM where the input is equal
to output and randomly generated weights are cho-
sen to be orthogonal. The internal representation of
MLELM-AE p rovides an effective solu tion not only
for feed forward neural network s but also for multi-
layered feed forward neural networks. MLELM-AE
network p rovides better generalization performance
than traditional back propa gation based deep neural
networks.
To further improve generalization in future work
we will focus on learning the dictionary of each class
in both the spectral and spatial domain . We will furt-
her look into the possible ways to decre ase the com-
putational complexity of the model by employing the
resorting of spatial filtering (Hao et al., 2017) and
extended multi-attribute p rofiles-based (Mura et al.,
2010) methods.
ACKNOWLED GEM EN T
The authors would like to thank the anonymous refe-
rees for their valuable comments and helpful su gges-
tions.
Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classification
81
REFERENCES
Ahmad, M., Alqarni, M. A., Khan, A. M., Hussain, R.,
Mazzara, M., and Distefanob, S. (2019). Segmented
and non-segmented stacked denoising autoencoder for
hyperspectral band reduction. Optik - International
Journal for Light and Electron Optics, 180:370–378.
Ahmad, M., Bashir, A. K. , and Khan, A. M. (2017a). Me-
tric similarity regularizer to enhance pixel similarity
performance for hyperspectral unmixing. Optik - In-
ternational Journal for Light and Electron Optics,
140(C):86–95.
Ahmad, M., haq, I. U., and Qaisaro, M. (2011). Aik method
for band clustering using stat istics of correlation and
dispersion matrix. In 2011 International Conference
on Information Communication and Management, pa-
ges 114–1180.
Ahmad, M., Khan, A. M., and Hussain, R. (2017b).
Graph-based spatial-spectral feature learning for hy-
perspectral image classification. IET Image Proces-
sing, 11(12):1310–1316.
Ahmad, M., Khan, A. M., Hussain, R., Protasov, S., Chow,
F., and Khattak, A. M. (2016). Unsupervised geo-
metrical feature learning from hyperspectral data. In
2016 IEEE Symposium Series on Computational In-
telligence (SSCI), pages 1–6.
Ahmad, M., Protasov, S., Khan, A. M., Hussain, R., Khat-
tak, A. M., and K han, W. A. (2018). Fuzziness-based
active learning framework to enhance hyperspectral
image classification performance for discriminative
and generative classifiers. PLoS ONE, 13:e0188996.
Arguello, F. and Heras, H. B. (2015). Elm-based spectral–
spatial classification of hyperspectral images using ex-
tended morphological profiles and composite feature
mappings. Int. J. Remote Sens., 36(2):645–664.
Chen, C., Li, W., Su, H., and Liu, K. (2014). Spectral-
spatial classification of hyperspectral image based on
kernel extreme learning machine. Remote Sensing,
6(6):5795–5814.
Datasets, H. accessed on may, 2018. http:
//www.ehu.eus/ccwintco/index.php/Hyperspectral
Remote Sensing Scenes.
Ding, S., Zhao, H., Zhang, Y., Xu, Z., and Nie, R. ( 2015).
Extreme learning machine: Algorithm, theory and ap-
plications. Artif. Intell. Rev., 44(1):103–115.
Dora, B. H., Arguello, F., and Pablo, Q.-B. (2014). Ex-
ploring elm-based spatial spectral classification of hy-
perspectral images. International Journal of Remote
Sensing, 35(2):401–423.
Hao, L., Chang, L., Cong, Z., Zhe, L., and Chengyin, L.
(2017). Hyperspectral image classification with spa-
tial fi ltering and 2,1 norm. In Sensors.
He, L., Li, J., Li u, C., and Li, S. (2018). Recent advances on
spectralspatial hyperspectral image classification: An
overview and new guidelines. IEEE Transactions on
Geoscience and Remote Sensing, 56(3):1579–1597.
Huang, G. B., Chen, L., and Siew, C.-K. (2006). Universal
approximation using i ncremental constructive feed-
forward networks with random hidden nodes. Trans.
Neur. Netw., 17(4):879–892.
Hughes, G. (1968). On the mean accuracy of st at istical pat-
tern recognizers. IEEE Transactions on Information
Theory, 14(1):55–63.
Johnson, W. and Lindenstrauss, J. (1984). Extensions of
lipschitz maps into a hilbert space. 26:189–206.
Kasun, L., Zhou, H., Huang, G. B., and Vong, C.-M.
(2013). Representational learning with elms for big
data. 28:31–34.
Li, J., Bioucas-Dias, J. M., and Plaza, A. (2013). Spectral-
spatial classification of hyperspectral data using
loopy belief propagation and active learning. IEEE
Transactions on Geoscience and Remote Sensing,
51(2):844–856.
Liu, C., He, L., Li, Z., and Li, J. (2018). Feature-driven
active learning for hyperspectral image classification.
IEEE Transactions on Geoscience and Remote Sen-
sing, 56(1):341–354.
Mura, M. D., Benediktsson, J. A., Waske, B., and Bruz-
zone, L. (2010). Morphological at tribute profiles for
the analysis of very high resolution i mages. IEEE
Transactions on Geoscience and Remote Sensing,
48(10):3747–3762.
Ren, J., Zabalza, J., Marshall, S., and Zheng, J. (2014).
Effective feature extraction and data reduction in re-
mote sensing using hyperspectral imaging [applica-
tions corner]. IEEE Signal Processing Magazine,
31(4):149–154.
Shen, Y., Xu, J., Li, H., and Xiao, L. (2016). Elm-based
spectral-spatial classification of hyperspectral images
using bilateral filtering information on spectral band-
subsets. In 2016 IEEE International Geoscience and
Remote Sensing Symposium (IGARSS), pages 497–
500.
Zhou, Y., Peng, J., and Chen, C. L. P. (2015). Extreme
learning machine with composite kernels for hyper-
spectral image classification. IEEE Journal of Se-
lected Topics in Applied Earth Observations and Re-
mote Sensing, 8(6):2351–2360.
VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications
82