Multi-layer Extreme Learning Machine-based Autoencoder for

Hyperspectral Image Classiﬁcation

Muhammad Ahmad

1,2,∗

, Adil Mehmood Khan

, Manuel Mazzara

and Salvatore Distefano

Innopolis University, Innopolis, Russia

University of Messina, Messina, Italy

Keywords:

Extreme Learning Machine (ELM), Deep Neural Networks (DNN), Auto Encoder (AE), Hyperspectral Image

Classiﬁcation.

Abstract:

Hyperspectral imaging (HSI) has attracted the formidable interest of the scientiﬁc community and has been

applied to an increasing number of real-life applications to automatically extract the meaningful information

from the corresponding high dimensional datasets. However, traditional autoencoders (AE) and restricted

Boltzmann machines are computationally expensive and do not perform well due to the Hughes phenomenon

which is observed in HSI since the r at io of the labeled training pixels on the number of bands is usually

quite small. To overcome such problems, this paper exploits a multi-layer extreme learning machine-based

autoencoder (MLELM-AE) for HSI classiﬁcation. MLELM-AE learns feature representations by adopting

a singular value decomposition and is used as basic building block for learning machine-based autoencoder

(MLELM-AE). MLELM-AE method not only maintains the fast speed of traditional ELM but also greatly

improves the performance of HSI classiﬁcation. The experimental results demonstrate the effectiveness of

MLELM-AE on several well-known HSI dataset.

1 INTRODUCTION

Hyperspectral images (HSI) provides a unique way

for characterizing objects of interest in ge ographi-

cal scenes with very rich spatial-spectral informa-

tion contained in a 3-D hypercube (Ahmad et al.,

2016). However, classiﬁcation of such high dimen-

sional hyperspectral data is still a challenging ta sk,

especially in the case the ratio between the number of

available labeled training samples and the number of

spectral dimensions (usually large) is small, which is

commonly known as Hughes phenomenon (Hughes,

1968).

To cope with the issues due to the h igh number of

dimensions, a number of feature extraction, selection,

and classiﬁcation m e thods h ave been proposed in the

recent years (Ren et al., 2014; Ahmad et al., 2011; Liu

et al. , 2 018). The se methods have yielded q uite good

outcomes. However, their performa nce can be further

improved by addressing two main issues: 1) inaccu-

rate classiﬁcation in the case of the Hughes pheno-

menon (Ahma d et al., 2018); 2) compar atively low

efﬁciency for processing high dimensional HSI data

(Ahmad et al., 2017a).

Extreme learning m a chine (ELM), a s a single hid-

den layer feed -forward neural network, is an effective

and fast machine learning method and has received

a remarkab le attention due to its high ge neralization

performance (Ding e t al., 201 5). In ELM, the hidd e n

layer parameters need to not be tu ned once the num-

ber of hidden layer nodes is lear ned. Moreover, the

bias and weights between the hidden and input layers

are random ly assigned without taking into account the

training samples and applications (Zhou et al., 201 5).

Due to its generalization capabilities, ELM has

been exten sively stu died for HSI classiﬁcation pro-

blems, for instance in (Arguello and Heras, 2015;

Shen et al., 2016), extended morphological p roﬁles

and bilateral ﬁlterin g based meth ods were used for

feature extraction and ELM was u sed as a base clas-

siﬁer. In (Ch en et al., 2014; Dora et al., 2014),

Gabor ﬁlter and watershed-based methods were em-

ployed f or feature extraction and ELM was used as a

ﬁnal classiﬁcation me thod. Regardless of comp utati-

onal complexity and other issues, these methods have

achieved a remarkable performance for HSI classi-

ﬁcation. However, th ese me thods igno re one very

important aspect in ELM: the randomly rendered in-

put bias and weights may cause ill-posed problems.

Based on this phenomenon, to handle such a pro-

Ahmad, M., Khan, A., Mazzara, M. and Distefano, S.

Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classiﬁcation.

DOI: 10.5220/0007258000750082

In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), pages 75-82

ISBN: 978-989-758-354-4

blem effectively and efﬁciently we intend to exploit

the multi-layer extreme learning machine-based a u-

toencod er (MLELM-AE) meth od for HSI classiﬁca-

tion wh en we do not need to extract the f eatures ex-

plicitly, as mentioned in (Kasun et al., 2013), for di-

git classiﬁcation problems. To the best of our know-

ledge, it is a ﬁrst of its kind of work for HSI classiﬁca-

tion. A similar criterion has been explored in the past

(Kasun e t al., 2013). However, in our work , instead

of using the pipeline for traditional image classiﬁca-

tion or recognition, we implemented and tested it on

hypersp ectral image classiﬁcation and segmentation

problem which is mo re complected then the traditio-

nal image classiﬁcation.

The remainder of the paper is structured as fol-

lows. Section 2 presents the theoretical aspects of ex-

treme learnin g machine pipelin e followed by a theo-

retical explanation of th e extreme learning machine

learning based a utoencoder. Section 3 discusses ex-

perimental setups and metrics. Section 4 discussed

the dataset, settin gs, and results. Finally, Section 5

summarizes the contributions and futu re research di-

rections.

2 EXTREME LE AR N IN G

MACHINE

In ELM, th e bias and weight vecto rs between the hid-

den and input layer are ran domly assigned, while the

net values are obtained by the learn ing process. Once

the initial values are preserved, the hidden layer out-

put matrix p e rsist unaltered in the learning process.

Let as assume, X = (x

,· · · ,x

) ∈ R

d×N

be the tra ining data which has N number of pixels

and each pixel has d-dimensional feature . Let Y =

,· · · ,y

) ∈ R

M× N

be a matrix representing

the class labels of th e training samples in which M

is the number of classes in HSI data . Thus, the

ELM model with L hidden neurons and the activation

function H(x) can be expressed as;

∑

j=1

H(W

+ b

) = y

;i = 1,2,3,· · · N (1)

where H(W

+ b

) represents the output of the j

hidden neuron with respect to the input x

and β

, W

and b

represents the weight vector between hidden

layer and output layer, and weight and bias between

hidden and input layer, respectively. The above ex-

pression can simply be written as;

β = Y

(2)

where

β = [β

,β

,· · · ,β

]

L×M

(3)

H = [H(x

),H(x

),· · · H(x

)]

L×N

(4)

and

H(x

) = [H

),H

),· · · , H

)]

L×1

(5)

Finally, β can be computed as;

β ≈ (H

)

†

(6)

where (.)

†

is the Moore Penrose generalized inverse

of a matrix.

The main goal of multi-layer extreme learning

machine-based autoencoder is to learn a useful fea-

ture representation in three different folds similar to

traditional autoencoders (Ahmad et al., 2019). Na-

mely, compressed representation - man ifest input fe-

atures form h igh dimensional hyperspectral space to

a lower dimensio nal feature space, sparse representa-

tion - low d imensional input f eature space to higher

dimensional hyperspectral feature space, a nd ﬁnally,

equal input/output dimensional representation - inter-

pret input space dimensions equal to feature space di-

mension.

According to (Kasun et al., 2013; Huang et al.,

2006), extreme learning m achine is a universal ap-

proxim ator, therefore, MLELM-AE is also a univer-

sal approximator. In ML ELM-AE the orthogonal

random biases and weights of the hidden nodes un-

dertaken the input samples to equal dimensio nal sp a ce

as shown in (Kasu n et al., 2013; Huang et al., 2006;

Johnson a nd Lindenstrauss, 1984) and in below equa-

tion similar to the equation (1),

h ≈ g(a × x + b) (7)

where a

× a = I in which a = [a

,· · · a

], and

× b = 1 in which b = [b

,· · · ,b

] are the or t-

hogonal random weights and bias between input and

hidden nod es, respectively. T herefore, as shown in

(Kasun et al., 2013; Huang et al., 2006), the output

weights for compressed and sparse MLELM-AE re-

presentation can be obtained by incorp orating the re-

gularization term to enhance the generalization p e r-

formance and robustness

β ≈



+ H



−1

X (8)

where C is the regularization term, X =

,· · · ,x

] are input and output data samples,

and H = [h

,· · · ,h

] are th e hidden layer

outputs of MLELM-AE. The outp ut weights can also

be computed as;

β ≈ H

−1

X (9)

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

where β

β = I to make the input and output equal.

Therefore, the singular value decompo sition of re-

gularized output weights for compressed and sparse

MLELM-AE representation ca n be compute d as in

(Kasun et al., 2013) , i.e.

Hβ ≈

∑

i=1

X (10)

where v represents the sin gular values of H an d u re-

presents the eigenvectors of HH

. Since, H is the

projected feature space squashed via a liner or nonli-

near (sigmoid or any appropriate) activation function,

we speculate that the output weights β will be lear-

ning to represent the features of the inpu t space via

singular value decomposition.

Furthermore, if the numb er of hidden nodes L

hidden layer is equal to the number of hidden no-

des L

k−1

in the (k − 1)

hidden layer in which g is

chosen as linear activation function other way around

g will be chosen as nonlinea r piece-wise activation

function. This way

≈ g((β

)

k−1

) (11)

where H

is the k

hidden layer output matrix. For

better in tuition, the input after x can be identiﬁed as

the 0

hidden layer where k = 0. Finally, the out-

put of the connections among the last hidden layer

and the output node t is over-analytically computed

by employing regularized least squar es, where t is the

output data.

3 EXPERIMENTAL METRIC S

In this section, the performanc e of MLELM-AE is

evaluated using seven different well-known publicly

available AVIRIS, ROSIS, and NASA EO-1 satellite

Hyperion sensor based hyperspectral datasets. More

informa tion about these datasets can be foun d in (Liu

et al., 2018; Ahmad et al., 2018; Ahmad et al., 2017b ;

Li et al., 2013; He et al., 2018; Datasets, ).

Confusion matrix is ge nerally used to evaluate the

performance o f HSI classiﬁcation in terms of over-

all, average accuracy, an d kappa κ coefﬁcient. In this

work, the overall accuracy for hyperspectral image

classiﬁcation is computed by the following formula:

OA =

∑

i=1

∑

j=1

∑

i=1

i j

(12)

From the above equation, it can be seen that the

magnitude of the overall accuracy is only affected by

the diagonal elements. It is more likely affected by

classes that contain m ore eleme nts so it is not suf-

ﬁcient to comprehensively evaluate the classiﬁcation

accuracy of all classes. A more com prehensive in-

dex of classiﬁcation accuracy evaluation is the κ coef-

ﬁcient utilizing all samples of the confusion matrix

thus reﬂecting the consistency between classiﬁcation

results and gro und truth. The κ coefﬁcient is evalua-

ted by the fo rmula (Ahmad et al., 2017b):

κ =

∑

−

∑

−

∑

(13)

where N is the total number of samples (pixels in HSI

cube), a

is the number of correctly predicted samples

in the given c la ss,

∑

is the sum of the number of

correctly predicted samples, b

is the actual number

of samples belonging to the given class and d

is the

number of samples that have been correc tly pre dicted

into the given class (Ahmad et al., 2018).

Furthermore, to evaluate the signiﬁcance of

MLELM-AE, several statistical tests are conducted

e.g, F1-score, precision, and recall ra te . The precision

maps the ratio of correctly identiﬁed positive samples

to the total predicted positive samples. High precision

value indicates lesser false positive rates referring to

the model ability to correctly identify the true posi-

tive samples. Whereas, recall a ccounts the ratio of

correctly predicted positive samples from the entire

positive samples as true. As similar to precision, the

higher recall rate the be tter the model is.

Likewise, F1-score is a weighted average of pre-

cision and recall rates. Therefore, F1-score takes both

false negatives and false positives into account. F1-

score is more useful then the other accuracy measures,

but intuitively n ot as easy to unde rstand as accur a cy,

particularly when we have unbalance d class distribu-

tion. Several accuracy measures works well if false

negatives and positives have similar cost, if in case

these are different, then better to consider both preci-

sion and recall rate to evaluate the model.

In this section, we will also evaluate the relevant

tuning parameters which include the number of neu-

rons in the hidden layers, the total number of layers,

and the appropriate value for regularization term C. In

our experiments, the regulariza tion term is automa-

tically tuned by the 5-fold- cross-validation process.

The number of hidden layer neurons is systematically

set fro m the range [Total Number of Training Samples

- Total Number of Testing Samples], and the numb er

of layers is heuristically set in the range [1 − 5] for

cross-validation process to ﬁnd the optimu m value of

regularization term from the interval [1e

−1

− 1e

Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classiﬁcation

4 EXPERIMENTAL RESULTS

AND DISC U SSION

In this section , we will brieﬂy discuss the experimen-

tal results acquired by MLELM-AE pipeline on se-

ven different hyperspectral datasets. Prior to the ex-

periments, we performed the nec essary normalization

between [0 − 1]. All the experiments have been car-

ried o ut on a clu ster using MATLAB (2017a) on In-

tel Core (TM) i7-7700K CPU 2.40GHz, 1962 MHz,

Ubuntu 16.01.5 LTS, Cu de complation to ols, realease

7.5, V7.5.17 with 65GB RAM.

The presented experiments shows the a ccuracy

analysis in terms of overall accuracy, average accu-

racy, and κ coefﬁcient. Figures 1-7 shows the groun d

truth maps for original test samples along with the

prediction of these samples in geographical maps.

Furthermore, these Figures also presented the

average, overall, and κ accuracy in multiclass form

along with the mean squared error of MLELM-AE

model in 10-fo ld-cross-validation alo ng with the trai-

ning and testing time for each dataset. The training

and test time is signiﬁcantly less than the trad itional

back propagation based d e ep neural ne tworks. Fu rt-

hermore, the plo ts shows higher generalization per-

formance with less amount of training samples.

To highlig ht the class-based classiﬁcation r e sults,

Tables 1-7 r eport the κ coefﬁcient fo r each individual

class, providing insights on the nu mber of training

versus estimated labels used in our experiments and

thus demonstrating clear advantages of using limited

samples for th e learning MLELM-AE model. In most

cases, the proposed pipeline outperforms existing so-

lutions.

Experiments with Salinas Dataset

The Salinas dataset consists of 224 spectral bands

with a high spatial resolution of 3.7 m. Salinas full

scene was collected by AVIRIS sensor over Salinas

Valley California.

In Salinas scene some bands we re water absor p-

tion and removed prior to the analyses. The removed

bands are 108 − 112, 154 − 167 and 224. Th e full Sa-

linas scene is covered with 512 × 217 pixels per band

and co ntains vegetables, bare, soils an d vineyard ﬁeld.

Salinas ground truth contains 16 classes.

A sub -scene of Salinas dataset named Salinas-A

consists of 86 × 83 samples per band and 6 classes.

The Salinas-A samples are located in the full Salinas

scene at 591 − 6 76 an d 158 − 240. Dataset ﬁles and

description can be o btained from (Datasets, ).

The experimental results are shown in Tables 1

and 2 an d Figures 1 and 2. From results, it can be cle-

arly seen that MLELM-AE pipe line greatly improved

the classiﬁcation accuracies for Salinas and Salinas-A

datasets with acceptable generalization performance.

Furthermore, th e detailed accuracy and time taken to

train and test the model is provided in the caption re-

spective Figures. In all these experiments the training

size is set as 1% samples from Salinas and Salinas-A

datasets, respectively.

Table 1: Classiﬁcation accuracy (κ) analysis and statistical

measures for Salinas-A Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Brocoli Green Weeds 1 (8, 375) 0.9316±0.0516 0.9948 0.9999 0.9973

Corn Senesced Green Weeds (27, 1289) 0.9695±0.0144 0.9795 0.9999 0.9896

Lettuce Romaine 4wk (13, 591) 0.9773±0.0212 0.9934 0.9772 0.9852

Lettuce Romaine 5wk (31, 1464) 0.9975±0.0022 0.9999 0.9973 0.9987

Lettuce Romaine 6wk (14, 647) 0.9947±0.0026 0.9969 0.9763 0.9865

Lettuce Romaine 7wk (16, 767) 0.9757±0.0125 0.9796 0.9783 0.9789

Table 2: Classiﬁcation accuracy (κ) analysis and statistical

measures for Salinas Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Brocoli Green Weeds 1 (61, 2009) 0.9982±0.0003 0.9979 1.0000 0.9989

Brocoli Green Weeds 2 (112, 3726) 0.9968±0.0004 0.9958 0.9991 0.9975

Fallow (60, 1976) 0.8404±0.0232 0.8721 0.9631 0.9153

Fallow Rough Plow (42, 1394) 0.9835±0.0033 0.9859 0.9925 0.9892

Fallow Smooth (81, 2678) 0.9883±0.0029 0.9795 0.9002 0.9382

Stubble (119, 3959) 0.9971±0.0005 0.9963 0.9994 0.9979

Celery (108, 3579) 0.9963±0.0007 0.9959 0.9945 0.9952

Grapes Untrained (339, 11271) 0.8891±0.0068 0.8795 0.7884 0.8314

Soil Vinyard Develop (187, 6203) 0.9909±0.0026 0.9960 0.9848 0.9904

Corn Senesced Green Weeds (99, 3278) 0.9348±0.0086 0.9465 0.9552 0.9508

Lettuce Romaine 4wk (33, 1068) 0.9539±0.0081 0.9748 0.9465 0.9605

Lettuce Romaine 5wk (58, 1927) 0.9999±0.0001 0.9994 0.9623 0.9806

Lettuce Romaine 6wk (28, 916) 0.9785±0.0018 0.9752 0.9569 0.9659

Lettuce Romaine 7wk (33, 1070) 0.9284±0.0083 0.9402 0.9587 0.9493

Vinyard Untrained (219, 7268) 0.6238±0.0105 0.6300 0.7857 0.6994

Vinyard Vertical Trellis (55, 1807) 0.9840±0.0020 0.9857 0.9976 0.9917

Test

Test Prdict

Figure 1: True Testing Maps and predicted Test Maps

for Salinas-A dataset with 10-fold-cross-validation-based

Average Accuracy = 0.9744 ± 0.0093, Overall Accuracy =

0.9797± 0.0059, κ = 0.9746±0.0073, Mean-squared Error

= 0.2561 ± 0.0478, and Training Time = 0.0918 ± 0.0067

and Test Time = 0.2539 ± 0.0141.

Experiments with Kennedy Space Center

Dataset

The NASA AVIRIS instrument acquired data over

the Kennedy Space Center (KSC) Florida o n March

23, 1996. AVIRIS acquired data in 224 bands of

10 nm width with center wavelengths in the range

400 − 2500 nm, from an altitude of approxima te ly

20 km with a spatial resolution of 18 m. After remo-

ving water absorption and low SNR bands, 176 bands

were used for the analysis.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

Test Test Prdict

Figure 2: True Testing Maps and predicted Test Maps

for Salinas dataset with 10-fold-cross-validation-based

Average Accuracy = 0.9428 ± 0.0018, Overall Accuracy =

0.9106± 0.0013, κ = 0.9002±0.0015, Mean-squared Error

= 1.9138± 0.0087, and Training Time = 54.2526 ± 2.0612

and Test Time = 105.7566± 3.6922.

Training data were selected using land cover maps

derived from colo r infrared photography provided by

the KSC and Lan dsat Thematic Mapper (TM) ima-

gery. The vegetation classiﬁcation scheme was d eve-

loped b y KSC perso nnel in an effort to deﬁne functio-

nal types that are discern-able at the spatial re solution

of Landsat an d this AVIRIS dataset.

Discrimination o f land cover for this environment

is difﬁcult due to the similarity of spe ctral sig natures

for certain vegetation types. For classiﬁcation purpo-

ses, 13 classes representing the various lan d cover ty-

pes th a t occur in this environmen t were deﬁned for the

site. Dataset ﬁles a nd description can be taken from

(Datasets, ). The experimental r esults are shown in

Table 3 and Figure 3. From r e sults, one can conclude

that MLELM-AE greatly improved the classiﬁcation

accuracies for more complicated Kennedy space cen-

ter AVIRIS sensor dataset with enhanced ge neraliza-

tion capabilities. Moreover, the detailed accuracy and

time taken to train and test the model is provided in

the caption.

Table 3: Classiﬁcation accuracy (κ) analysis and statistical

measures for Kennedy Space Center Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Scrub (229, 761) 0.9895±0.0018 0.9906 0.7877 0.8776

Willow Swamp (73, 243) 0.9541±0.0169 0.8941 0.7958 0.8421

CP/Oak (77, 252) 0.9240±0.0225 0.9162 0.6667 0.7718

CP hammock (76, 256) 0.4903±0.0388 0.5000 0.8073 0.6175

Slash Pine (49, 161) 0.6384±0.0253 0.6250 0.8434 0.7179

Oak/Broadleaf (69, 229) 0.2444±0.0192 0.2875 0.7541 0.4163

Hardwood Swamp (32, 105) 0.4603±0.0658 0.6575 0.7500 0.7007

Graminoid Marsh (130, 431) 0.8545±0.0133 0.8638 0.9629 0.9107

Spartina Marsh (156, 520) 0.9841±0.0039 0.9890 0.8933 0.9387

Cattail Marsh (122, 404) 0.9439±0.0089 0.9362 0.9778 0.9565

Salt Marsh (126, 419) 0.9785±0.0044 0.9659 0.9861 0.9759

Mud Flats (151, 503) 0.8773±0.0179 0.9233 0.9207 0.9219

Water (279, 927) 0.9852±0.0043 0.9784 0.9969 0.9875

Test

Test Prdict

Figure 3: True Testing Maps and predicted Test Maps for

Kennedy Space Center (KSC) dataset wi th 10-fold-cross-

validation-based Average Accuracy = 0.7942 ± 0.0059,

Overall Accuracy = 0.8786± 0.0034, κ = 0.8642± 0.0038,

Mean-squared Error = 1.4954 ± 0.0374, and Training Time

= 14.2984± 0.5296 and Test Time = 0.7258 ± 0.0662.

Experiments with I ndian Pines Dataset

Indian Pines dataset is gathered by AVIRIS sensor

over the Indian Pines test site in north -western Indi-

ana and consists of 1 45 × 145 pixels and 224 b ands in

the wavelength range 0.4 − 2.5 × 10

−6

meters.

Indian Pines dataset contains 2/3 agriculture, and

1/3 f orest or other natural pere nnial vegetation. There

are two major dual lane highways, a rail lin e, as well

as some low density housing, other build structure s,

and small roads. Since Indian Pine s dataset was taken

in June some of the crops present, corn, soybeans, are

in early stages of growth with less then 5% coverage.

The grou nd truth available is distinguished into six-

teen classes not all mutually exclusive.

We have also reduced the nu mber of bands to 200

by r e moving bands covering the region of water ab-

sorption. The removed bands are 104-108, 150-163,

220. Dataset ﬁles and description can be obtained

from ( D a ta sets, ). The expe rimental results are shown

in Table 4 and Figure 4. From results, one can con-

clude that MLELM-AE greatly impr oved the classiﬁ-

cation ac c uracies with enhanced generalization cap a -

bilities. Furthermore, the detailed accuracy analysis

and time taken to train and test the model is provided

in the caption.

Table 4: Classiﬁcation accuracy (κ) analysis and statistical

measures for Indian Pines Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Alfalfa (10, 46) 0.4556±0.0736 0.6944 0.8065 0.7464

Corn-notill (286, 1428) 0.8117±0.0070 0.8284 0.7472 0.7857

Corn-mintill (166, 830) 0.6048±0.0215 0.5768 0.7539 0.6536

Corn (48, 237) 0.3862±0.0385 0.3492 0.7952 0.4853

Grass-pasture (97, 483) 0.8943±0.0159 0.8834 0.9419 0.9118

Grass-trees (146, 730) 0.9829±0.0046 0.9846 0.9055 0.9434

Grass-pasture-mowed (6, 28) 0.5409±0.0469 0.5455 1.0000 0.7059

Hay-windrowed (96, 478) 0.9927±0.0038 0.9921 0.9595 0.9756

Oats (4, 20) 0.1625±0.0688 0.2500 0.8000 0.3809

Soybean-notill (195, 972) 0.6779±0.0137 0.7091 0.7527 0.7303

Soybean-mintill (491, 2455) 0.8457±0.0053 0.8432 0.7520 0.7951

Soybean-clean (119, 593) 0.7569±0.0172 0.8101 0.8571 0.8329

Wheat (41, 205) 0.9866±0.0043 0.9756 0.9639 0.9697

Woods (253, 1265) 0.9641±0.0047 0.9664 0.9297 0.9477

Buildings-Grass-Trees-Drives (78, 386) 0.6042±0.0159 0.6266 0.8143 0.7083

Stone-Steel-Towers (19, 93) 0.7284±0.0421 0.8108 1.0000 0.8956

Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classiﬁcation

Test

Test Prdict

Figure 4: True Testing Maps and predicted Test Maps

for Indian Pines dataset with 10-fold-cross-validation-

based κ = 0.7839 ± 0.0028, Average Accuracy = 0.7122 ±

0.0078, Overall Accuracy = 0.8122 ± 0.0024, Mean-

squared Error = 2.6312 ± 0.0359, and Training Time =

38.8711± 0.7759 and Test Time = 3.2365 ± 0.1330.

Experiments with Pavia University and

Pavia Center Datasets

The Pavia University (PU) dataset is acquired by the

ROSIS optical sensor during a ﬂight campaign over

Pavia in northern Italy with geometric resolution of

1.3m. PU data consists of 102 spectral bands with

1096 × 1096 samples per band.

Some of the samples in PU dataset contain s no in-

formation and have to be discarded prior to the analy-

sis. PU scene ground-truths identiﬁed 9 classes. Da-

taset ﬁles and description can be obtained from (Da-

tasets, ).

The experimental results are shown in Tables 5

and 6 an d Figures 5 and 6. From results, one can con-

clude that MLELM-AE greatly improved the classi-

ﬁcation accuracies with enhanced generalization ca-

pabilities for more complicated ROSIS sen sor ba-

sed datasets. Evaluating ROSIS sensor datasets is

more ch allenging classiﬁcation problem dominated

by com plex u rban classes and nested r egio ns then

AVIRIS. The detailed accuracy analysis in terms of

average, overall, and kappa accuracies along with the

time taken to train and test the model is provided in

the caption.

Table 5: Classiﬁcation accuracy (κ) analysis and statistical

measures for Pavia U niversity Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Asphalt (1285, 6631) 0.8105±0.0080 0.8091 0.8916 0.8485

Meadows (1980, 18649) 0.9725±0.0016 0.9694 0.8049 0.8795

Gravel (93, 2099) 0.6303±0.0133 0.6406 0.7374 0.6856

Trees (81, 3064) 0.8115±0.0048 0.8125 0.8199 0.8162

Painted metal sheets (198, 1345) 0.9947±0.0011 0.9926 0.9917 0.9923

Bare Soil (278, 5029) 0.2468±0.0078 0.2256 0.7964 0.3516

Bitumen (219, 1330) 0.6607±0.0116 0.6859 0.8857 0.7731

Self-Blocking Bricks (228, 3682) 0.8296±0.0096 0.8252 0.6421 0.7222

Shadows (86, 947) 0.8599±0.0140 0.8885 0.9299 0.9088

Test

Test Prdict

Figure 5: True Testing Maps and predicted Test Maps

for Pavia University dataset w ith 10-fold-cross-validation-

based Overall Accuracy = 0.8099± 0.0014, Average Accu-

racy = 0.7574 ± 0.0018, κ = 0.7386 ± 0.0019, Mean-

squared Error = 1.8703 ± 0.0100, and Training Time =

322.2472± 2.6395 and Test Time = 70.9769± 1.2438.

Table 6: Classiﬁcation accuracy (κ) analysis and statistical

measures for Pavia Center Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Water (25, 824) 0.9996±0.0001 0.9997 0.9976 0.9986

Trees (24, 820) 0.9473±0.0055 0.9499 0.8319 0.8870

Asphalt (23, 816) 0.5989±0.0211 0.5849 0.7904 0.6723

Self Blocking Bricks (21, 808) 0.6389±0.0175 0.6820 0.6612 0.6715

Bitumen (21, 808) 0.8887±0.0119 0.8727 0.8287 0.8501

Tiles (38, 1260) 0.8173±0.0051 0.8071 0.9364 0.8669

Shadows (15, 476) 0.8621±0.0091 0.8877 0.9412 0.9136

Meadows (25, 824) 0.9957±0.0002 0.9955 0.9778 0.9866

Bare Soil (24, 820) 0.8518±0.0123 0.8739 0.8519 0.8628

Test

Test Prdict

Figure 6: True Testing Maps and predicted Test Maps

for Pavia Center dataset with 10-fold-cross-validation-

based κ = 0.9360 ± 0.0008, Average Accuracy = 0.8445 ±

0.0027, Overall Accuracy = 0.9549 ± 0.0006, Mean-

squared Error = 0.5067 ± 0.0075, and Training Time =

355.3899± 3.6140 and Test Time = 410.2014± 7.97897.

Experiments with Botswana Dataset

The NASA EO-1 Satellite acquired a sequence of data

over the Okavango Delta, Botswana in 2001-2004.

The Hyperion sensor on EO-1 acquired data at 30 m,

pixels resolution over a 7.7 km strip in 242 bands co-

vering the 400 − 2500 nm po rtion of the spe ctrum in

10 nm windows.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications

Preprocessing of the data was performed by the

UT center for space research to mitigate the effects of

bad detectors, inter-detector mis-calibration , and in-

termittent anoma lies. Uncalib rated and noisy bands

that cover water absorption features were removed,

and the remaining 145 bands were included as candi-

date features.

The removed features a re 1 0-55, 82 -97, 102-119,

134-164, and 187-220. The d ata analyzed in this

study have been acquired on May 31, 2001, and con-

sist of observations form 14 identiﬁed classes repre-

senting the land cover types in seasonal swamps, oc-

casional swamps, and drier woodlands located in the

distal portion of the Delta.Dataset ﬁles and descrip-

tion can be obtained from (Datasets, ).

The exp erimental results are shown in Table 7

and Figure 7. From results, one can conclude

that MLELM-AE greatly improved the classiﬁcation

accuracies with enhanced generalization capabilities

for more complicated Hyperion sensor on EO-1 sen -

sor based datasets. Evaluating Hyperion sensor on

EO-1 sensor datasets is more challenging classiﬁca-

tion problem dominated b y complex urban classes

and nested regions then AVIRIS and ROSIS. The de-

tailed accuracy analysis in terms of average, overall,

and kappa accuracies along with th e time taken to

train and test the model is provided in the caption.

Table 7: Classiﬁcation accuracy (κ) analysis and statistical

measures for Botswana Dataset.

Class Names (Train, Test) κ Recall Precision F1-Score

Water (81, 270) 0.9989±0.0021 1.000 1.0000 1.0000

Hippo Grass (31, 101) 0.9757±0.0187 1.000 0.9859 0.9929

Floodplain Grasses1 (76, 251) 0.9897±0.0057 0.9943 0.9943 0.994285714

Floodplain Grasses1 (65, 215) 0.9953±0.0039 1.0000 0.9494 0.9740

Reeds1 (81, 269) 0.8787±0.0132 0.8829 0.8737 0.8783

Riparian (81, 269) 0.7011±0.0204 0.7128 0.8323 0.7679

Firescar 2 (78, 259) 0.9912±0.0040 0.9945 1.0000 0.9972

Island Interior (61, 203) 0.9796±0.0067 0.9859 0.9929 0.9894

Acacia Woodlands (95, 314) 0.9347±0.0135 0.9361 0.9031 0.9193

Acacia Shrublands (75, 248) 0.9006±0.0172 0.8671 0.9554 0.9091

Acacia Grasslands (92, 305) 0.9577±0.0124 0.9718 0.9039 0.9367

Short Mopane (55, 181) 0.9190±0.0187 0.9524 0.9302 0.9412

Mixed Mopane (81, 268) 0.9337±0.0145 0.9412 0.9072 0.9239

Exposed Soils (29, 95) 0.9682±0.0129 1.0000 1.0000 1.0000

In this section, we performed a set of experiments

to evaluate MLELM-AE using both ROSIS, AVIRIS,

and NASA EO-1 Satellite Hyperion EO-1 sen sors da-

tasets. Evaluating ROSIS and Hyperion sensors data -

sets are more c hallenging c la ssiﬁcation p roblems do-

minated by complex urban classes and nested regions

then AVIRIS. Figures 1-7 and Tables 1-7 shows the

overall, average, and kap pa (κ) accuracies along with

the trainin g and test time taken as a function of the

number of labeled samples. The Figures 1-7 and Ta-

bles 1-7 are generated based on only selected sam-

ples in contrast to the entire population which reveal

clear advantages of using fewer labeled samples for

MLELM-AE pipeline .

Test Test Prdict

Figure 7: True Testing Maps and predicted Test Maps for

Botswana datasets with 10-fold-cross-validation-based κ =

0.9268 ± 0.0039, Average Accuracy = 0.9374 ± 0.0041,

Overall Accuracy = 0.9325 ± 0.0036, Mean-squared Error

= 0.8309 ± 0.0413, and Training Time = 2.5815 ± 0.0895

and Test Time = 0.1872 ± 0.0182.

5 CONCLUSIONS AND FUTURE

WORK

In this work we implem e nted a framework for hy -

perspectral image c la ssiﬁcation in computational ef-

ﬁcient fashion using extreme learnin g machine-based

autoencoder (MLELM-AE ). MLELM-AE is a spe-

cial case of traditional ELM where the input is equal

to output and randomly generated weights are cho-

sen to be orthogonal. The internal representation of

MLELM-AE p rovides an effective solu tion not only

for feed forward neural network s but also for multi-

layered feed forward neural networks. MLELM-AE

network p rovides better generalization performance

than traditional back propa gation based deep neural

networks.

To further improve generalization in future work

we will focus on learning the dictionary of each class

in both the spectral and spatial domain . We will furt-

her look into the possible ways to decre ase the com-

putational complexity of the model by employing the

resorting of spatial ﬁltering (Hao et al., 2017) and

extended multi-attribute p roﬁles-based (Mura et al.,

2010) methods.

ACKNOWLED GEM EN T

The authors would like to thank the anonymous refe-

rees for their valuable comments and helpful su gges-

tions.

Multi-layer Extreme Learning Machine-based Autoencoder for Hyperspectral Image Classiﬁcation

REFERENCES

Ahmad, M., Alqarni, M. A., Khan, A. M., Hussain, R.,

Mazzara, M., and Distefanob, S. (2019). Segmented

and non-segmented stacked denoising autoencoder for

hyperspectral band reduction. Optik - International

Journal for Light and Electron Optics, 180:370–378.

Ahmad, M., Bashir, A. K. , and Khan, A. M. (2017a). Me-

tric similarity regularizer to enhance pixel similarity

performance for hyperspectral unmixing. Optik - In-

ternational Journal for Light and Electron Optics,

140(C):86–95.

Ahmad, M., haq, I. U., and Qaisaro, M. (2011). Aik method

for band clustering using stat istics of correlation and

dispersion matrix. In 2011 International Conference

on Information Communication and Management, pa-

ges 114–1180.

Ahmad, M., Khan, A. M., and Hussain, R. (2017b).

Graph-based spatial-spectral feature learning for hy-

perspectral image classiﬁcation. IET Image Proces-

sing, 11(12):1310–1316.

Ahmad, M., Khan, A. M., Hussain, R., Protasov, S., Chow,

F., and Khattak, A. M. (2016). Unsupervised geo-

metrical feature learning from hyperspectral data. In

2016 IEEE Symposium Series on Computational In-

telligence (SSCI), pages 1–6.

Ahmad, M., Protasov, S., Khan, A. M., Hussain, R., Khat-

tak, A. M., and K han, W. A. (2018). Fuzziness-based

active learning framework to enhance hyperspectral

image classiﬁcation performance for discriminative

and generative classiﬁers. PLoS ONE, 13:e0188996.

Arguello, F. and Heras, H. B. (2015). Elm-based spectral–

spatial classiﬁcation of hyperspectral images using ex-

tended morphological proﬁles and composite feature

mappings. Int. J. Remote Sens., 36(2):645–664.

Chen, C., Li, W., Su, H., and Liu, K. (2014). Spectral-

spatial classiﬁcation of hyperspectral image based on

kernel extreme learning machine. Remote Sensing,

6(6):5795–5814.

Datasets, H. accessed on may, 2018. http:

//www.ehu.eus/ccwintco/index.php/Hyperspectral

Remote Sensing Scenes.

Ding, S., Zhao, H., Zhang, Y., Xu, Z., and Nie, R. ( 2015).

Extreme learning machine: Algorithm, theory and ap-

plications. Artif. Intell. Rev., 44(1):103–115.

Dora, B. H., Arguello, F., and Pablo, Q.-B. (2014). Ex-

ploring elm-based spatial spectral classiﬁcation of hy-

perspectral images. International Journal of Remote

Sensing, 35(2):401–423.

Hao, L., Chang, L., Cong, Z., Zhe, L., and Chengyin, L.

(2017). Hyperspectral image classiﬁcation with spa-

tial ﬁ ltering and 2,1 norm. In Sensors.

He, L., Li, J., Li u, C., and Li, S. (2018). Recent advances on

spectralspatial hyperspectral image classiﬁcation: An

overview and new guidelines. IEEE Transactions on

Geoscience and Remote Sensing, 56(3):1579–1597.

Huang, G. B., Chen, L., and Siew, C.-K. (2006). Universal

approximation using i ncremental constructive feed-

forward networks with random hidden nodes. Trans.

Neur. Netw., 17(4):879–892.

Hughes, G. (1968). On the mean accuracy of st at istical pat-

tern recognizers. IEEE Transactions on Information

Theory, 14(1):55–63.

Johnson, W. and Lindenstrauss, J. (1984). Extensions of

lipschitz maps into a hilbert space. 26:189–206.

Kasun, L., Zhou, H., Huang, G. B., and Vong, C.-M.

(2013). Representational learning with elms for big

data. 28:31–34.

Li, J., Bioucas-Dias, J. M., and Plaza, A. (2013). Spectral-

spatial classiﬁcation of hyperspectral data using

loopy belief propagation and active learning. IEEE

Transactions on Geoscience and Remote Sensing,

51(2):844–856.

Liu, C., He, L., Li, Z., and Li, J. (2018). Feature-driven

active learning for hyperspectral image classiﬁcation.

IEEE Transactions on Geoscience and Remote Sen-

sing, 56(1):341–354.

Mura, M. D., Benediktsson, J. A., Waske, B., and Bruz-

zone, L. (2010). Morphological at tribute proﬁles for

the analysis of very high resolution i mages. IEEE

Transactions on Geoscience and Remote Sensing,

48(10):3747–3762.

Ren, J., Zabalza, J., Marshall, S., and Zheng, J. (2014).

Effective feature extraction and data reduction in re-

mote sensing using hyperspectral imaging [applica-

tions corner]. IEEE Signal Processing Magazine,

31(4):149–154.

Shen, Y., Xu, J., Li, H., and Xiao, L. (2016). Elm-based

spectral-spatial classiﬁcation of hyperspectral images

using bilateral ﬁltering information on spectral band-

subsets. In 2016 IEEE International Geoscience and

Remote Sensing Symposium (IGARSS), pages 497–

500.

Zhou, Y., Peng, J., and Chen, C. L. P. (2015). Extreme

learning machine with composite kernels for hyper-

spectral image classiﬁcation. IEEE Journal of Se-

lected Topics in Applied Earth Observations and Re-

mote Sensing, 8(6):2351–2360.

VISAPP 2019 - 14th International Conference on Computer Vision Theory and Applications