![](bg2.png)
task consists of semantic segmentation models based
on deep learning, e.g., U-Net, DeepLab, and PSP-
Net (Ronneberger et al., 2015a; Chaurasia and Cu-
lurciello, 2017; Zhao et al., 2017a; Chen et al., 2017).
Thus, this work aims to evaluate different experimen-
tal configurations, varying architectures, and encoders
in segmenting clouds in multispectral satellite im-
ages. We considered three deep-learn-based seman-
tic segmentation architectures (U-Net, LinkNet, and
PSP-Net) combined with four different pre-trained
encoders (ResNet-50, VGG-16, MobileNet V2, and
EfficientNet B2). This work continues a previous
study, available at (Arakaki et al., 2023).
Our comprehensive and pragmatic experimental
setup provides a valuable comparative analysis of the
best deep-learning models and training strategies to
address the challenge of segment clouds in satellite
images. Our results provide useful information for
the development of satellite image analysis solutions
in the context of precision agriculture.
This paper is organized as follows: After this sec-
tion introduces the subject, motivation, and objec-
tives, Section 2 summarizes the state-of-the-art of se-
mantic segmentation methods for clouds in satellite
images. In Section 3, we describe our material and
methods. results are presented and discussed in Sec-
tion 4, and we present our conclusions in Section 5.
2 RELATED WORK
Mohajerani et al. (Mohajerani et al., 2018a) proposed
a cloud segmentation framework based on a fully con-
nected network (FCN) inspired by U-Net. The fully
connected encoder is connected to a fully connected
decoder with some skip connections. the dataset 38-
Cloud was first introduced in this work. In (Moha-
jerani and Saeedi, 2019), Mohajerani et al. proposed
Cloud-Net, a fully connected network intended for
cloud segmentation. Cloud-Net is composed of con-
volutional blocks containing addition, concatenation,
and copy layers, followed by ReLu activation func-
tions. Considering Jaccard, Precision, Recall, Speci-
ficity, and Accuracy, Cloud-Net improved all indexes
when compared with (Mohajerani et al., 2018b).
Gonzales and Sakla (Gonzales and Sakla, 2019)
trained and evaluated a model based on U-Net using
transfer learning to perform semantic segmentation of
clouds in satellite images. Evaluation of the proposed
approach used traditional segmentation metrics (e.g.,
Jaccard, Precision, and Specificity). Experiments
were conducted using the 38-Cloud dataset, which
considers images of a multispectral nature. In this
sense, the proposed approach performed better using
the pre-trained ImageNet encoder for three channels
(red, green, and blue). In contrast, there is better per-
formance for the Near Infrared (NIR) channel when
considering random initialization of weights.
Meraner et al. (Meraner et al., 2020) proposed
an approach based on a Residual Convolutional Neu-
ral Network (ResNet) to remove clouds in multispec-
tral images from the Sentinel-2 satellite. The model
consists of a fully connected architecture, which can
perform on input images with arbitrary spatial dimen-
sions during the training process. The approach pro-
posed by the authors was performed on a dataset from
the geographic region corresponding to the European
continent. To train the approach, such a dataset has
images separated geographically and by seasons to
have the gold standard for subsequent reconstruction
of a region with clouds. In short, the approach pro-
posed by (Meraner et al., 2020) made it possible to
remove extremely thick clouds and reconstruct an op-
tical representation of the Earth’s surface obstructed
in the image by the cloud.
Buttar and Sachan (Buttar and Sachan, 2022) pro-
posed a deep learning-based approach called SE-
UNet++ to perform the cloud segmentation problem
on the 95-Cloud dataset. In general, SEUNet++ is
based on U-Net++ with a lightweight channel atten-
tion mechanism. Furthermore, different backbones
were tried as encoders for the proposed approach
(e.g., ResNet-18, ResNet-34, ResNet-50, ResNet-
101, DenseNet-264, CSPNet, and EfficientNet-B8)
for performance comparison purposes. The experi-
ments showed that SEUNet++ obtained an Intersec-
tion over Union (IoU) value of 91.8%, improving the
state of the art by 0.23%. In addition to IoU, SE-
UNet++ also performed better in indices such as ac-
curacy, precision, and recall, which generates defined
cloud boundaries capable of segmenting thinner cloud
layers. Finally, the authors demonstrated that using
the transfer learning technique had a practical impact
on the task.
(Arakaki et al., 2023) also aimed to evaluate meth-
ods based on deep learning (CNNs in particular) for
segmenting clouds in satellite images. For this, three
models based on the classic U-Net were compared,
each with adaptations in their encoders. The three
models were called Simple U-Net (with no changes
to the basic structure of the traditional network), U-
Net with VGG-16 backbone, and U-Net with ResNet-
18 backbone. The models were trained using the 38-
Cloud dataset and evaluated according to the Recall,
Jaccard, Accuracy, Precision, and Specificity met-
rics. The results showed that U-Net Simples per-
formed better for the Recall, Jaccard, and Accuracy
indices. When considering Precision and Specificity,
VISAPP 2024 - 19th International Conference on Computer Vision Theory and Applications
234