are considered as two separate challenging issues in
the litterature. Our research aims to suggest an adap-
tive structure to address the issue of scaling up and
processing large volume of remote sensing data using
Hadoop Framework and DL. In this work, we pro-
pose an approach that uses DL architecture for data
processing and Hadoop HDFS for data storage. Sec-
tion 2 offers a short survey of some related works.
Section 3 describes the proposed approach for remote
sensing data processing. In Section 4, we describe
data source, software and hardware configuration and
results obtained. Section 5 concludes the paper.
2 RELATED WORKS
The amount and quality of satellite images have been
greatly improved with the growth of satellite technol-
ogy. These data can not be processed using stan-
dard techniques. Although parallel computing and
cloud infrastructure (Hadoop, Spark, Hive, HBase,
etc.) make it possible to process such massive data,
such systems are sufficient for spatial and temporal
data.
Some works were performed on raster images
in the literature using MapReduce programming
paradigm. (Cary et al., 2009) presented MapRe-
duce model for the resolution of two major vec-
tor and raster data spatial issues: R-Trees bulk con-
struction and aerial image quality computation. Im-
agery data is stored in a compressed DOQQ (A Dig-
ital Orthophoto Quadrangle and Quarter Quadrangle)
file format, and Mapper and Reducer process those
files. (Golpayegani and Halem, 2009) implements
some image processing algorithms using MapReduce
model. Indeed, the first step is to convert images to
text format and then to binary format before using
them as a raw image. In contrast, (Almeer, 2012) pre-
sented a six-fold speedup for auto-contrast and eight-
fold speedup for the sharpening algorithm. (Kocaku-
lak and Temizel, 2011) used Hadoop and MapReduce
to operate a ballistic image analysis that needs a volu-
minous image database to be paired with an unknown
image. It was shown that the processing time was
lowered dramatically as 14 computational nodes were
in cluster setup. This method used a high computa-
tional requirement. (Li et al., 2010) tried to decrease
the time required for computing the huge amount of
satellite images using Hadoop and MapReduce meth-
ods for running parallel clustering algorithms. The
method begins with the clustering of each pixel and
then computes all current cluster centers according to
each pixel in a collection of clusters. (Lv et al., 2010)
suggested a different clustering algorithm that uses a
K-means strategy to remote sensing image process-
ing. Objects with matching spectral values, without
any formal knowledge, are grouped together. The
Hadoop MapReduce strategy supported the parallel
K-means strategy, as the algorithm is intensive both
in time and in memory. All these works concentrate
essentially on parallel processing using the Hadoop
Map-Reduce framework for image data.
For the remote sensing data processing task using
DL, Convolutional Neural Network (CNN) has shown
important enhancement in image similarity task as-
signments as the latest effective deep learning branch.
The concept that deep convolutional networks can re-
trieve high-level features in the deeper layers led by
the researchers to investigate methods this technique
can decrease the semantic gap.The extracted features
can be used as image representations in search algo-
rithms on both fully connected layers and convolution
layers. (Sun et al., 2016) proposed a method based on
CNN that extract features from local regions, in ad-
dition of extracting features from the whole images.
(Gordo et al., 2017) merged RMAC (Regional Max-
imal ACtivation) with triplet networks and also sug-
gested a regional proposal network (RPN) strategy for
the identification of the region of interest (RoI) and
the extraction of local RMAC descriptors. (Zhang
et al., 2015) proposed a gradient boosting random
convolutional network (GBRCN) to rank very high
resolution (VHR) satellite imagery. A sum of func-
tions (called boosts) are optimized in GBRCN. For
optimization, a modified multi-class softmax func-
tion is used, making the optimization job simpler,
SGD is used for optimization. (Zhong et al., 2017)
used reliable tiny CNN kernels and profound archi-
tecture to learn about hierarchical spatial relationships
in satellite data. An output class label of a softmax
classifier based on CNN DL inputs. The CPU han-
dles preprocessing (data splitting and normalization),
while the GPU runs convolution, ReLU and pooling
tasks, and the CPU handles dropout and softmax clas-
sification. Networks with one to three convolution
layers are evaluated, with receptive fields. In order
to estimate region boundary confidence maps which
are then interfused to create an aggregate confidence
map, (Basaeed et al., 2016) used a CNN committee
that conducts a multi-size analysis for each group.
(Längkvist et al., 2016) used the CNN in multispec-
tral images (MSI) for a complete, quick and precise
pixel classification, with a small cities digital surface
design. In order to improve the high level segmen-
tation, the low level pixel classes are then predicted.
The CNN architecture is evaluated and is analyzed.
(Marmanis et al., 2016) have tackled the prevalent
RS issue of restricted training information by using
Deep Semantic Feature Detection from Multispectral Satellite Images
459