view, such as : Moran’s I index, c index of Geary,
Cliff and Ord indices, Getis and Ord (G
i
et G
∗
i
) indices
and Local Indicators of Spatial Association (LISA)
(Shekhar et al., 2015).
We focus in this work on supervised classification
task. We propose a new classification algorithm for
spatially autocorrelated data, that we call SKDA, for
Spatial Kernel Discriminant Analysis. Our algorithm
is a spatial extension of classical Kernel Discriminant
Analysis rule; it is founded on a kernel estimate of
the spatial probability density function that integrates
two kernels: one controls the observed values while
the other controls the spatial locations of observations
(Dabo-Niang et al., 2014).
One potential application of our SKDA algo-
rithm is the classification of remotely sensed hy-
perspectral images. This problem has attracted a
lot of attention over the past decade. Several stud-
ies proposed spectral-spatial classification algorithms
which integrate spatial context and spectral infor-
mation of the hyperspectral image. This incorpo-
ration of spatial information has shown great im-
pact for improving classification accuracy (He et al.,
2017). According to the way the spatial dimension
is incorporated, these methods can be classified into
three main categories: integrated spectral-spatial ap-
proaches, preprocessing-based approaches, and post-
processing-based approaches (Fauvel et al., 2013). A
survey about the incorporation of spatial information
for HSI classification is presented in (Wang et al.,
2016).
The contribution of our work is double; on one
hand, we propose a spatial classification algorithm
managing the dependency of data, on the other hand,
a spatial-spectral method is proposed for the classifi-
cation of HSI with competitive results.
The remainder of the paper is organized as fol-
lows: Section 2 defines the context of this study, and
presents the background knowledge essential to the
understanding of our algorithm. Section 3 presents
our SKDA algorithm. Section 4 shows experimental
results of our method on hyperspectral image classi-
fication. Finally, section 5 summarizes the results of
this work and draws conclusions.
2 BACKGROUND
In this section, we begin by formally defining the nec-
essary notations that we will adopt throughout this pa-
per. Then, present the background knowledge essen-
tial to the understanding of our algorithm.
In this work, we focus on geostatistical data, we
consider a spatial process {Z
i
= (X
i
,Y
i
) ∈ R
d
× N,i ∈
Z
N
,d ≥ 1,N ∈ N
∗
}, defined over a probability space
(Ω,F,P) and indexed in a rectangular region I
n
= {i ∈
Z
N
: N ∈ N
∗
,1 ≤ i
k
≤ n
k
,∀k ∈ {1, ...,N}}. Where a
point i = (i
1
,...,i
N
) ∈ Z
N
is called a site, representing
a geographic position. And let ˆn = n
1
×n
2
×...×n
N
=
Card(I
n
) be the sample size, and f(.) the density func-
tion of X ∈ R
d
. Each site i ∈ I
n
is characterized by a
d-dimensional observation x
i
= (x
i1
,x
i2
,...,x
id
).
In this work we are interested on supervised clas-
sification which consists of building a classifier that
from a given training set containing input-output pairs
allows to predict a class Y
i
∈ {1, 2,..., m} of a new ob-
servation x
i
.
2.1 Bayes Classifier
Bayes classifier is one of the widely used classifica-
tion algorithms due to its simplicity, efficiency and
efficacy. It is a probabilistic classifier which rely on
Bayes’ theorem and the assumption of independence
between the features, in other words, the classification
decision is made basing on probabilities. It consist of
assigning an instance x to the class with the highest a
posteriori probability.
Supposing that we have m classes, associated each
with a probability density function f
k
where f
k
=
P(x/k) and an a priori probability π
k
that an obser-
vation belongs to the class k, k ∈ {1,2,..., m}. Bayes
discriminant rule is formulated as follows :
Assign x to class k
0
where k
0
= arg
max
k∈{1,2,...,m}
(π
k
f
k
(x))
(1)
2.2 Kernel Density Estimation
As we might notice, Bayesian methods require a
background knowledge of many probabilities, which
consists of a practical difficulty to apply it. To avoid
these requirements, these probabilities can be esti-
mated based on available labeled data or by mak-
ing assumptions about the distributions (Gramacki,
2018). Two classes of density estimators are recog-
nized in the literature. The first, known as paramet-
ric methods are based upon the assumption that data
are drawn form an arbitrary well known distribution
(e.g. Gaussian, Gamma, Cauchy ... etc.) and finding
the best parameters describing this distribution . Two
commonly used techniques can be sited: Bayesian pa-
rameter estimation and maximum likelihood. How-
ever, this assumption about the form of the under-
lying density is not always possible because of the
complexity of data. In such cases, nonparametric es-
timation techniques are required. These methods do
not make any a priori assumptions about the distri-
Spatial Kernel Discriminant Analysis: Applied for Hyperspectral Image Classification
185