Atmospheric Correction of Sentinel-2 Images Using Deep Learning

Stavan Shah

1 a

, Kahaan Patel

1 b

, Ankur Garg

2 c

and Shivangi Surati

1 d

Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, Gujarat,

India

Indian Space Research Organization, Ahmedabad, Gujarat, India

{stavanshah20, kahaanpatel5}@gmail.com, agarg@sac.isro.gov.in, shivangi.surati@gmail.com

Keywords:

Atmospheric Correction, Pix2Pix, Conditional Generative Adversarial Network, Remote Sensing, Deep

Learning, Sentinel-2

Abstract:

Remote sensing relies heavily on pre-processing steps, one of which is the Atmospheric Correction (AC).

It corrects the effects of atmosphere on satellite images. This makes it a vital step in ensuring accurate es-

timation of land Surface Reﬂectance (SR) that can be used in various downstream applications. But such

conventional AC methods are computationally expensive because they use physics-based radiative transfer

codes, need metadata from each image as well as many different atmospheric parameters which might not

all be easy to estimate accurately. A novel Deep Learning (DL) model designed for AC without having to

explicitly estimate atmospheric parameters is proposed in this research. The deep learning model was trained

using a wide-ranging dataset collected by Google Earth Engine that included four bands of Sentinel 2 images

covering all states in India. The proposed approach directly predicts SR values from Sentinel-2 satellite im-

agery using this data driven method. It generated promising results by accurately estimating SR values with

ground measurements and sentinel input data experiments conﬁrming this point too. This approach not only

simpliﬁes the AC process but also achieves comparable or even superior performance compared to traditional

physics-based methods. The evaluation results show that Pix2Pix model has good performance, with average

SSIM, PSNR, RMSE and MAE of 0.96, 42.14, 0.0097 and 0.0071 respectively. The experimental ﬁndings

underscore the potential of deep learning as a robust and efﬁcient alternative for atmospheric correction in

remote sensing applications.

1 INTRODUCTION

Through remote sensing, the images taken from

above the earth’s surface are essential for mon-

itoring and understanding. Unfortunately, these

images are normally tainted with atmospheric ef-

fects such as scattering and absorption that can dis-

tort actual surface reﬂectance. Atmospheric cor-

rection is a critical pre-processing step in remote

sensing that aims at removing these atmospheric

inﬂuences to give precise values of surface re-

ﬂectance (Zhu and Xia, 2023), (Zhang et al., 2022).

The main objective of AC is to transform Top-

Of-Atmosphere (TOA) reﬂectances into Bottom-Of-

Atmosphere (BOA)/surface reﬂectances. This en-

ables interpretation and investigation of remote sens-

https://orcid.org/0009-0000-1778-1745

https://orcid.org/0009-0005-9981-6257

https://orcid.org/0000-0002-2872-5690

https://orcid.org/0000-0003-4381-5130

ing data more accurately. Complex mathematical for-

mulation involving the aerosol content, Rayleigh scat-

tering as well as water vapor among others are neces-

sary to convert TOA Reﬂectance to SR. An example

includes 6S (Second Simulation of the Satellite Signal

in the Solar Spectrum) model that is used to approx-

imate atmospheric parameters depending on radiative

transfer principles (Ilori and Knudby, 2020). Parti-

cles such as atmospheric gases, aerosols and others

scatter and absorb sunlight and hence, the process of

atmospheric correction is extremely important for any

satellite imagery analysis.

The Sentinel-2 mission is outstanding among the

numerous Earth observation satellites due to its ex-

tensive coverage and high resolution. It belongs to

Copernicus program of European Space Agency and

offers multispectral imagery with 13 spectral bands

across visible, near infrared and short wavelength in-

frared parts of electromagnetic spectrum. These are

useful in land cover characterization, vegetation in-

dices retrieval and water quality monitoring thus mak-

Shah, S., Patel, K., Garg, A. and Surati, S.

Atmospheric Correction of Sentinel-2 Images Using Deep Learning.

DOI: 10.5220/0013303900004646

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 1st International Conference on Cognitive & Cloud Computing (IC3Com 2024), pages 175-185

ISBN: 978-989-758-739-9

175

ing the Sentinel-2 data paramount in various envi-

ronmental and agriculture applications (Phiri et al.,

2020).

These methods are usually based on physics

which require extensive computational resources and

accurate knowledge of the atmosphere variables (Liu

et al., 2022). In the recent years Pix2Pix model,

a GAN variant, has achieved impressive results in

image-to-image translation tasks. In the proposed

work, while trained on large paired datasets of TOA

and SR images for instance, Pix2Pix models gener-

ate high-quality SR images directly from TOA in-

puts by learning their complex mapping. The other

advantages of using this model is ﬂexibility, direct

optimization through end-to-end learning framework,

generalization competencies and robustness to devia-

tions in input data and high quality outputs. The gen-

erated images are then compared using several eval-

uation parameters such as Structural Similarity Index

Measure (SSIM), Peak Signal to Noise Ratio (PSNR),

Root Mean Square Error (RMSE), and Mean Abso-

lute Error (MAE) with their ground truths.

The organization of the paper is as follows. The

detailed discussion of prior methods and techniques

of atmospheric correction is presented in Section 2.

The data, model architecture and evaluation tech-

niques incorporated in this work are presented in Sec-

tion 3. The evaluated results of the proposed method-

ology are presented and discussed in section 4. Sec-

tion 5 wraps up by summarising the main conclusions

and going over the probable next lines of research.

2 RELATED WORK

Several algorithms for Aerosol Optical Depth (AOD)

retrieval from TOA data have been developed and

are widely used. Some of these algorithms are Dark

Target (Remer et al., 2020), Deep Blue (Hsu et al.,

2013) and Multi-Angle Implementation of Atmo-

spheric Correction (MAIAC). However, one of the

main issues in AOD retrieval is the difﬁculty in accu-

rately parameterizing the basic aerosol optical proper-

ties which leads to large uncertainties. Additionally,

there exist other methods used for Columnar Water

Vapor (CWV) estimation. Some common approaches

include Low-Rank Subspace Projection-Based Wa-

ter Estimator (LRP-WAVE) (Acito and Diani, 2018)

and Atmospheric Pre-corrected Differential Absorp-

tion (APDA) (Schl

apfer et al., 1998). Neverthe-

less, these algorithms suffer from drawbacks such

as being based on physical assumptions or requiring

hard-to-get parameters respectively. SR uncertainty

comes from two factors; AOD and CWV estimated er-

rors during SR derivation from TOA data estimation,

AODs alone should not be a priority compared to both

because their accuracies affect each other’s accuracy

too much and hence, they need accurate estimates.

On the ﬂip side, image-based methods achieve AC

solely through the use of images taken from satellite

or aerial sensors; they don’t need any atmospheric pa-

rameters as input but instead utilize only the infor-

mation inherent in the image itself. The Dark Ob-

ject Subtraction is one of the most basic techniques in

which at least two targets with low and high reﬂec-

tivity from within scene must be identiﬁed. Another

image-based AC model called Quick Atmospheric

Correction (QuAC) (Bernstein et al., 2012) works on

a different assumption- average group material spec-

tra remains same across various scenes. If there are

more than ten different things present in background

then QuAC performs well. Although they are in-

tuitive and computationally efﬁcient, these methods

lack ability to quickly estimate surface reﬂectance

values at ﬁrst order due to their accuracy under con-

ditions involving seasonal and spectral variation.

Based on deep learning model approaches, two

atmospheric correction deep learning models were

trained and evaluated using one hundred thousand

batches of 40 transformed reﬂectance spectra to radi-

ance by means of MODTRAN (Basener and Basener,

2023). It allowed the deep learning model to ﬁg-

ure out the physics of radiation transfer from MOD-

TRAN. For this purpose, they compared two meth-

ods to estimate corrections in a well-known QuAC

model which is based on constant mean endmem-

ber reﬂectance assumption. Using the HY-1C CZI as

a case study, a new approach is presented in (Zhao

et al., 2023) to atmospheric correction based on deep

learning (SSACNet). The third dimension convolu-

tion was applied to extract spatial and spectral fea-

tures for the image while second dimension convolu-

tion was investigated for recovery of lost spatial in-

formation. According to in-situ data, the SSACNet

shows decent performance having average correlation

coefﬁcient of 0.89 and Absolute Percentage Deviation

(APD) ranging from 21.53% to 35.41% in four bands.

To do AC, all deep learning models that could

match the efﬁciency and accuracy of physics-based

techniques in computing are observed. Additionally,

DL models do not need any climatic or geometric

parameters as input. Compared to those based on

physics, computational power usage is reduced with

DL models. This is because, they can learn features

automatically and thus, making it easy to build and

train models. Stacked autoencoders, Convolutional

Neural Networks (CNNs) (Wang et al., 2022), and vi-

sion transformers (Liu et al., 2024) are some promis-

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

176

ing methods used in remote sensing imaging appli-

cations such as spectral spatial and temporal feature

extraction being performed by them.

In the proposed work, a new method is described

that uses DL for atmospheric correction in remote

sensing data without the need for atmospheric param-

eters and geometric parameters; only input images are

considered. To the best of our knowledge, the pro-

posed methodology is the ﬁrst approach that utilizes

Pix2Pix model for AC. The deep learning Pix2Pix

model was developed using a large dataset from dif-

ferent states of India in order to improve its perfor-

mance and generalization.

3 PROPOSED METHODOLOGY

The ﬂowchart presented in Figure 1 outlines the steps

of proposed methodology for atmospheric correction

of sentinel-2 images.

3.1 Dataset Collection

The proposed research used data from Google Earth

Engine, a cloud-based platform that provides access

to a wide range of geographical datasets and satellite

images. Both atmospherically corrected and incor-

rected satellite images were gathered for this study.

Atmospherically incorrected images are TOA mea-

surements taken by satellite sensors without compen-

sating for scattering or absorption by atmosphere. At-

mospherically corrected pictures have been processed

to eliminate atmospheric interference so that surface

reﬂectance values accurately represent features on

earth’s surface.

In this study, Sentinel-2 satellite data from the

Google Earth Engine is used. Sentinel-2 is a Euro-

pean Space Agency (ESA) satellite mission designed

for monitoring Earth’s land and coastal areas. The

usual number of spectral bands in Sentinel 2A/B MSI

imagery is 13 bands. However, only four speciﬁc

bands having 10m resolution were included in this

study as shown in Table 1.

Table 1: Sentinel-2 Band Information.

Band Wavelength

(nm)

Resolution

(m)

Band 2 (Blue) (b) 490 10

Band 3 (Green) (g) 560 10

Band 4 (Red) (r) 665 10

Band 8 (Near-

Infrared) (vnir)

842 10

Figure 1: Model ﬂowchart.

A total of 1000 images were collected. Each im-

age is atmospherically incorrected and its respective

atmospherically corrected image mapped together in

order to perform image to image translation. The im-

ages cover various regions across India, providing a

diverse and representative sample for analysis. There

are two ways to visualize the data:

1. True color composite: In true color composite im-

agery, the red, green, and blue bands (Bands 4,

3, and 2 respectively) are combined to replicate

human vision, producing an image akin to what

the human eye perceives. RGB images (true color

composite) are visualized in Figure 2a, where

(1), (3) are Atmospherically Incorrected Images

whereas (2), (4) are their Atmospherically Cor-

rected Images respectively.

2. False color composite: False color composite uti-

lizes non-visible bands, typically near-infrared

(Band 8), red (Band 4), and green (Band 3), to

highlight features not discernible to the human

eye. This composite reveals vegetation health,

land-water boundaries, and other environmental

characteristics, enhancing analysis in applications

such as agriculture, forestry, and environmental

monitoring. The False color images composite of

2 sample data are visualized in Figure 2b, where

(1) and (3) are Atmospherically Incorrected Im-

ages whereas (2) and (4) are their Atmospheri-

cally Corrected Images respectively.

3.2 Data Preprocessing

After gathering data from Google Earth Engine, orig-

inal dataset consisted of images in different sizes like

1000x1000 and 1200x1200 pixels. These patches

were taken out from input (atmospherically incor-

rected) and ground truth (atmospherically corrected)

images both at 256x256 pixels to standardize input

Atmospheric Correction of Sentinel-2 Images Using Deep Learning

177

(a) True color composites. (b) False color composites.

Figure 2: Comparison of true and false color composites.

data for further analysis and processing. Hence, each

image irrespective of its original dimensions became

(256, 256, 4) where ’4’ refers to four bands denoting

red, green, blue and near-infrared channels. In addi-

tion, top of atmosphere and surface reﬂectance mea-

surements obtained through satellite imagery usually

range between 0-30000.

Each measurement was divided by 10000 as an

important preprocessing step because this normaliza-

tion process has several beneﬁts such as scaling pixel

values into standard range so that they can be con-

sistently compared across different images; secondly

during training ML algorithms tend to work more sta-

ble and converge faster when their pixel values are

normalized over smaller ranges. Moreover, the col-

lected dataset was split into two sets: training set and

testing set. Among 1000 images, 800 were employed

for model training while remaining 200 served as test

data.

3.3 Model Architecture

In the ﬁeld of transforming pictures into other pic-

tures, Pix2Pix model is a very important and use-

ful tool. It provides a strong method for con-

verting input images to desired outputs which sup-

ports different applications like image denoising,

style transfer or semantic segmentation. For satel-

lite imagery atmospheric correction, this model can

be used to convert atmospherically incorrected im-

ages (rep-resenting top of atmosphere measurements)

into atmospherically corrected (depicting surface re-

ﬂectance measurements) that is vital for improving

the quality and accuracy of satellite image analysis.

The structure of the Pix2Pix model includes two ma-

jor parts as discussed.

3.3.1 Generator

The generator usually takes as input the images

with incorrect atmosphere corrections (TOA measure-

ments), which are typically represented as tensors of

shape (256, 256, 4) as shown in Figure 3. The role of

the generator is to convert input images (atmospheri-

cally incorrected images representing TOA measure-

ments) into desired outputs (atmospheric corrected

images representing SR measurements). The archi-

tecture of this generator is designed such that it can

collect and elaborate upon spatial as well as spectral

characteristics present in input images while deliver-

ing accurate results that are visually pleasing. Struc-

ture of a generator involves downsampling and up-

sampling parts for extracting abstract features from

input images and generating output ones correspond-

ingly (Figure 3). In Pix2Pix model’s generator, an al-

tered version U-Net architecture was used that is efﬁ-

cient and widely adopted in neural network design for

image-to-image translation tasks. This alteration in-

cludes encoder-decoder framework with skip connec-

tions enabling both down-sampling and up-sampling

operations while keeping intact spatial information

preservation capabilities within them.

Generator has two main layers:

1. Encoder: The encoder of the Pix2Pix model is

composed of various blocks where each block

contains a convolutional layer, batch normaliza-

tion and Leaky ReLU activation function. This

arrangement enables the encoder to extract high-

level features from input images effectively as

it reduces their spatial dimensions progressively.

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

178

Figure 3: Pix2Pix architecture.

Each layer in this part consists of:

• Convolutional Layer: Every block begins with

a convolutional layer that applies a set of learn-

able ﬁlters to the input feature. These ﬁl-

ters are designed to capture spatial information

and local patterns present in input images, thus

promoting feature extraction and representation

learning.

• Batch Normalization: The activations for each

layer across the mini-batch are stabilized by ap-

plying batch normalization after the convolu-

tional layer.

• Leaky ReLU Activation: After batch normal-

ization, applying a Leaky ReLU activation

function brings non-linearity into model that

helps it capturing complex features. It al-

lows small gradients for negative i/p values that

avoids vanishing gradient problem while pro-

moting more stable effective learning.

2. Decoder: The decoder of the Pix2Pix model is

made up of a series of blocks that mirror the en-

coder, which itself consists of transposed con-

volutional layers along with batch normalization,

dropout (used in the ﬁrst 3 blocks), and ReLU ac-

tivation. This also includes skip connections be-

tween corresponding encoder and decoder blocks

to allow information to move through the system

more easily while retaining spatial details. Each

decoder layer has:

• Transposed Convolutional Layer: Each block

in the decoder begins with a transposed con-

volutional layer, or deconvolution/upsampling

layer. It enlarges input feature maps so that

higher resolution output images can be recon-

structed by the decoder.

• Batch Normalization: Activations are stabi-

lized and training is speed up by applying

batch normalization after the transposed convo-

lutional layer.

• Dropout (Applied to the First 3 Blocks): Over-

ﬁtting is prevented and generalizability en-

hanced by applying dropout to ﬁrst three blocks

in the decoder.

• ReLU Activation: Non-linearity is brought

about and feature representation heightened

following batch normalization through an ap-

plication of ReLU activation function. Sparse

passing of only positive values to next layer(s)

promoted by ReLU.

Allowing the model to retain and propagate important

spatial information throughout the network, skip con-

nections between encoder and decoder blocks (as pic-

tured in Figure 4) help in generating output images

with ﬁne-grained details during reconstruction. In

fact the shown design of U-Net like this one also guar-

antees that during image-to-image translation process

by Pix2Pix; spatial features are captured or preserved

effectively at all levels of resolution.

The architecture of generator model is shown in

Figure 4, where ﬁrst half acts as an encoder which

downsamples input image size while second part acts

as a decoder which upsamples it back again with skip

connections between them.

3.3.2 Discriminator

In the Pix2Pix conditional generative adversarial net-

work, a discriminator acts as a convolutional Patch-

GAN classiﬁer, which means that it is designed to tell

apart real image patches from fake ones. During ad-

versarial training, it provides the generator with feed-

back about how well it creates natural-looking im-

ages. Every block in the discriminator performs a se-

quence of operations (Figure 5): Convolution, Batch

Normalization, Zero Padding (in some layers), Leaky

ReLU activation. This design allows the discrimina-

tor to extract features from image patches and learn

discriminative representations for classiﬁcation. Af-

ter the last layer of the discriminator, there is an out-

put shaped as follows: (batch-size, 30, 30, 1). In this

Atmospheric Correction of Sentinel-2 Images Using Deep Learning

179

Figure 4: Generator architecture.

Figure 5: Discriminator architecture.

case, each 30 x 30 image patch classiﬁes a 70 x 70

section of the in-put image. Such patch-wise classi-

ﬁcation strategy lets the discriminator concentrate on

local image details instead of global ones, thus mak-

ing it more effective for image-to-image translation

tasks. The discriminator receives two inputs as shown

in Figure 3:

1. Real Pair: Input image (atmospherically incor-

rected image) and target image (atmospherically

corrected image), which should be classiﬁed real

by discriminator.

2. Fake Pair: The input image (atmospherically in-

corrected image) and the generated image (image

generated by generator), which should be classi-

ﬁed as fake by it.

In order to process these inputs, they are concate-

nated along channel axis using tf.concat([inp, tar],

axis= -1) so that a composite input containing both

input images as well as corresponding ground truth

images/generated images can be created.

By making use of this joint input, discriminator

becomes able to distinguish between real and fake

pairs of images thereby giving adversarial feedback to

generator which in turn helps it produce more believ-

able output pictures. Through this back-and-forth be-

tween generator and discriminator components within

Pix2Pix model itself learns how make good trans-

lations among different types of pictures while at

same time ensuring that all atmospherically wrong-

looking inputs get transformed into atmospherically

right ones. Discriminator architecture is presented in

Figure 5 below where ﬁrst two layers represent in-

put images being concatenated in second layer (pink

layer). Various operations performed on concatenated

pair are depicted in Figure 5.

3.4 Model Training

The training steps can be divided into several impor-

tant parts which serve to improve the robustness as

well as accuracy of deep learning models.

1. Loss Functions: In Pix2Pix, the generator loss

involves multiple components that are useful for

guiding the training process. These components

are such as:

• L1 Loss: This measures the difference between

pixels in generated images and those in orig-

inal images also called Mean Absolute Error

(MAE) loss.

L1 Loss =

∑

i=1

|G(x

) − y

| (1)

• Generator GAN Loss: It is derived through

an adversarial training process based on dis-

criminator output which measures how good or

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

180

bad our generated samples are when compared

against real ones.

= −E

[log(D(G(z)))] (2)

• Generator Total Loss: This is aggregate of L1

loss with a weighted generator GAN loss. Goal

is to balance image ﬁdelity and image realism

during training.

Gen Total Loss = L1 Loss +λ · Gen GAN Loss

(3)

• Discriminator Loss: The discriminator loss

function measures the discrepancy between the

discriminator’s predictions (probability scores)

and the ground truth labels (real or fake).

Discriminator Loss = -

∑

i=1

log(D(x

))+

(1 − y

)log(1 − D(x

))] (4)

2. Optimization Algorithm: The Adam optimizer

was used to train our deep learning model with

a learning rate of 0.0002. It ensures fast con-

vergence and optimization by gradient descent.

Adam’s adaptive learning rate mechanism that ad-

justs based on the ﬁrst and second moments of

the gradients, provides a balance between con-

vergence speed and stability, crucial for the ad-

versarial nature of GANs. This adaptability helps

in achieving faster convergence and handling the

complex, non-linear transformations required for

atmospheric correction, where pixel-wise accu-

racy is critical. Momentum in Adam further

smooths the optimization path, leading to more

stable and reliable training outcomes. A learn-

ing rate of 0.0002 has been empirically validated

across various implementations of GANs, includ-

ing Pix2Pix, ensuring high quality image gener-

ation. This effectively balances need for quick

convergence without sacriﬁcing stability, avoid-

ing the pitfalls of divergence seen with higher

rates and the slow training of lower rates. The

robustness of Adam to noisy gradients, a common

challenge in GAN training, enhances its suitabil-

ity for atmospheric correction, ensuring consis-

tent and accurate correction across diverse atmo-

spheric conditions and geographical regions.

3. Batch Processing and Summary: Mini-batch

stochastic gradient descent was applied through-

out training, using a batch size of 2. This ap-

proach helps save memory space while speeding

up the training process. The model parameters

are updated based on small subsets of the train-

ing data. Furthermore, a summary after every 10

epochs is created that include model predictions

on three randomly selected images from dataset

using checkpointed model at that epoch. At these

checkpoints, both the Generator and Discrimina-

tor models were saved to preserve their progress

in training.

4. Epochs and Early Stopping: Training lasted for

100 epochs (equivalent to 80,000 steps) but had an

early stopping criterion based on validation loss

to avoid overﬁtting network weights. With early

stopping, one can monitor how well the model

performs during training and stop when there is

no improvement in validation performance.

3.5 Evaluation Parameters

Evaluating model performance is crucial to assess the

ﬁdelity and accuracy of the generated images com-

pared to ground truth images (atmospherically cor-

rected). Ground truth images provide a reference for

the desired output, allowing us to quantitatively mea-

sure the similarity between the generated/predicted

images and the true images. The evaluation param-

eters used for assessing the performance of Pix2Pix

model on both the training and testing datasets in-

clude:

1. Structural Similarity Index (SSIM): It measures

similarity between two images based on their lu-

minance, contrast and structure where difference

in perception between generated and ground truth

images is measured.

2. Peak Signal-to-Noise Ratio (PSNR): PSNR mea-

sures the ratio between the maximum possible

power of a signal and the power of corrupting

noise that affects the ﬁdelity of its representation.

It quantiﬁes the quality of the generated images

compared to the ground truth images.

3. Root Mean Squared Error (RMSE): RMSE mea-

sures the average difference between values pre-

dicted by the model and the observed values. It

represents square root of average of the squared

differences between predicted and observed val-

ues.

4. Mean Absolute Error (MAE): MAE measures the

average absolute difference between the predicted

and observed values. It provides a more intuitive

understanding of the average error magnitude.

Atmospheric Correction of Sentinel-2 Images Using Deep Learning

181

4 RESULTS

4.1 Predicting Images on Test Set

The model performed well on the test dataset, produc-

ing atmospherically corrected images that were very

similar to the corresponding ground truth images. The

synthesized image showed features and visual appear-

ances in common with the truth image. It reproduced

such things as texture, color tones and structural com-

ponents correctly which shows that it can capture de-

tailed patterns properly. Such similarity is shown in

Figure 6 where predicted (generated) images are com-

pared against their respective input images and in-

putted ground truths.

4.2 Evaluation Metrics

SSIM, PSNR, RMSE and MAE are used for evaluat-

ing quality of generated images compared to ground

truth images in image processing tasks. Range of

SSIM values is from -1 to 1, where 1 indicates per-

fect similarity between the images. Good SSIM val-

ues typically range between 0.9 and 1, indicating high

similarity and good quality. SSIM values below 0.5

are generally considered poor and may indicate sig-

niﬁcant differences between the images. PSNR val-

ues are expressed in decibels (dB). Higher PSNR val-

ues indicate better quality, with values closer to inﬁn-

ity representing perfect reconstruction. PSNR values

above 30 dB are typically considered good for most

applications. PSNR values below 20 dB may indicate

signiﬁcant distortion and poor quality. If the values

of both RMSE and MAE are decreasing then it can

be considered that difference in pixels of ground truth

and generated images is reduced and both the images

are getting more similar to each other.

The below tables (Tables 2 and 3) depicts the min-

imum, maximum, mean and standard deviation of

all evaluation parameters. Values in Table 2 are the

evaluation parameters that are recorded before train-

ing process and in that the images are only gener-

ated via Pix2Pix generator model (similar to U-net).

Whereas, values in Table 3 are the evaluation param-

eters that are recorded after 100 epochs of training

the Pix2Pix model. There is a signiﬁcant change

in values between these tables. Starting with SSIM,

which assesses the similarity between two images, a

mean SSIM value of -0.0025 with a standard devia-

tion of 0.0605 is shown in Table 2. This indicates poor

matching between the generated and ground truth im-

ages before training. In contrast, a signiﬁcant im-

provement in SSIM metrics is revealed in Table 3,

with a mean value of 0.961 and a standard deviation

Figure 6: Generated images.

of 0.068. This substantial enhancement demonstrates

the model’s ability to produce images that closely

resemble the ground truth after training. 95% of

SSIM values are greater than 0.85 which indicates

that model is efﬁciently generating images similar to

ground truth. SSIM values recorded for test dataset

are described in Figure 7a.

Moving on to PSNR, Table 2 demonstrates a mean

PSNR value of 11.0188 dB with a standard deviation

of 0.6115 dB. These values indicate low image qual-

ity before training, as higher PSNR values correspond

to better image ﬁdelity. How-ever, Table 3 shows a re-

markable improvement in PSNR metrics, with a mean

value of 42.14 dB and a standard deviation of 5.51 dB.

The drastic increase in PSNR reﬂects a signiﬁcant en-

hancement in image quality after training, suggesting

that the model produces images with reduced noise

and improved ﬁdelity. 97.5% of PSNR values are

greater than 30 that indicates that model is efﬁciently

generating images similar to ground truth. The PSNR

values recorded for test dataset are described in Fig-

ure 7b.

Next, Table 2 exhibits mean RMSE and MAE

values of 0.281 and 0.219, respectively, with stan-

dard deviations of 0.021 and 0.017. These values

indicate relatively high errors between generated and

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

182

Table 2: Evaluation metrics before training.

Evaluation Parameters Maximum value Minimum value Mean value Standard Deviation

SSIM 0.00202 -0.0336 -0.0025 0.0605

PSNR 12.663 7.0231 11.0188 0.6115

RMSE 0.445 0.232 0.281 0.021

MAE 0.345 0.161 0.219 0.017

Table 3: Evaluation metrics after training for 100 epochs.

Evaluation Parameters Maximum value Minimum value Mean Value Standard Deviation

SSIM 0.997 0.563 0.961 0.068

PSNR 52.93 25.11 42.14 5.51

RMSE 0.0555 0.0022 0.0097 0.0075

MAE 0.0394 0.0017 0.0071 0.0051

ground truth images before training. However, Table

3 demonstrates a noticeable reduction in both metrics,

with mean values of 0.0097 and 0.0071, and stan-

dard deviations of 0.0075 and 0.0051, respectively.

This reduction signiﬁes a signiﬁcant improvement in

model’s ability to minimize errors between generated

and ground truth images after training. Thus, compar-

ison between Tables 2 and 3 highlights substantial im-

provement in model’s performance after 100 epochs

of training.

4.3 Interpreting Losses

It takes more ﬁnesse to interpret the logs when train-

ing a GAN or a conditional GAN like Pix2Pix, com-

pared to simpler models such as classiﬁcation or re-

gression models. In our training process, several loss

functions are employed as dis-cussed in section 3.4.

Throughout the training epochs, it’s crucial to monitor

these losses to ensure balanced training dynamics be-

tween the generator and discriminator networks. One

key aspect to consider is the behaviour of the Genera-

tor L1 Loss, which consistently decreased from epoch

1 to epoch 100 (as shown in Figure 8c). This indicates

that the generator network progressively improved

its ability to generate images that closely match the

ground truth, reﬂecting the effectiveness of the train-

ing process. However, interpreting GAN losses is

more complex. If either Generator GAN Loss or Dis-

criminator Loss becomes excessively low, it suggests

that one model is overpowering other. Moreover, na-

ture of GAN training involves a competitive process

between generator and discriminator. Improvement in

one network’s loss often corresponds to an increase

in other network’s loss, creating a cycle of adversarial

learning. This dynamic equilibrium results in ﬂuc-

tuating loss values of both networks until converge

to stable points. It is observed that both discrimina-

tor and generator losses converge to permanent val-

ues over time (Figure 8a and 8b). This convergence

indicates that training process has reached a stable

equilibrium, where neither network dominates other.

Achieving this balance is crucial for producing high-

quality images that faithfully represent ground truth.

5 CONCLUSIONS

The proposed methodology successfully demon-

strated feasibility of extracting surface reﬂectance

values from top-of-atmosphere reﬂectance values us-

ing deep learning technique. Through application

of the Pix2Pix model, accurate atmospheric correc-

tion is achieved, transforming TOA reﬂectance im-

ages into atmospherically corrected images with re-

markable ﬁdelity. The evaluation metrics (Table 3)

consistently indicate high-quality results, afﬁrming

the effectiveness of the proposed approach. Further-

more, the analysis of loss functions revealed optimal

training dynamics, with the model converging to sta-

ble values. This suggests robust learning and effec-

tive adaptation to the training data. The evaluation re-

sults show that Pix2Pix model has good performance,

with average SSIM, PSNR, RMSE and MAE of 0.96,

42.14, 0.0097 and 0.0071 respectively. Importantly,

the methodology eliminates the need for complex

metadata or parameter calibration typically associated

with traditional atmospheric correction techniques.

By leveraging deep learning, a streamlined and efﬁ-

cient solution for atmospheric correction is provided,

offering potential applications in remote sensing and

environmental monitoring without the burden of ex-

tensive preprocessing.

Atmospheric Correction of Sentinel-2 Images Using Deep Learning

183

(a) SSIM values. (b) PSNR values.

Figure 7: SSIM and PSNR values between ground truth and predicted images from test dataset.

(a) Generator Gan Loss during training. (b) Discriminator Loss during training.

Figure 8: Comparison of losses.

REFERENCES

Acito, N. and Diani, M. (2018). Atmospheric column water

vapor retrieval from hyperspectral vnir data based on

low-rank subspace projection. IEEE Transactions on

Geoscience and Remote Sensing, 56(7):3924–3940.

Basener, B. and Basener, A. (2023). Gaussian process and

deep learning atmospheric correction. Remote Sens-

ing, 15(3):649.

Bernstein, L. S., Jin, X., Gregor, B., and Adler-Golden,

S. M. (2012). Quick atmospheric correction code: al-

gorithm description and recent upgrades. Optical en-

gineering, 51(11):111719–111719.

Hsu, N., Jeong, M.-J., Bettenhausen, C., Sayer, A., Hansell,

R., Seftor, C., Huang, J., and Tsay, S.-C. (2013). En-

hanced deep blue aerosol retrieval algorithm: The sec-

ond generation. Journal of Geophysical Research: At-

mospheres, 118(16):9296–9315.

Ilori, C. O. and Knudby, A. (2020). An approach

to minimize atmospheric correction error and im-

prove physics-based satellite-derived bathymetry in a

coastal environment. Remote Sensing, 12(17):2752.

Liu, S., Li, H., Jiang, C., and Feng, J. (2024). Spectral–

spatial graph convolutional network with dynamic-

synchronized multiscale features for few-shot hy-

perspectral image classiﬁcation. Remote Sensing,

IC3Com 2024 - International Conference on Cognitive & Cloud Computing

184

16(5):895.

Liu, S., Zhang, Y., Zhao, L., Chen, X., Zhou, R., Zheng, F.,

Li, Z., Li, J., Yang, H., Li, H., et al. (2022). Quan-

titative and automatic atmospheric correction (quaac):

Application and validation. Sensors, 22(9):3280.

Phiri, D., Simwanda, M., Salekin, S., Nyirenda, V. R., Mu-

rayama, Y., and Ranagalage, M. (2020). Sentinel-2

data for land cover/use mapping: A review. Remote

Sensing, 12(14):2291.

Remer, L. A., Levy, R. C., Mattoo, S., Tanr

e, D., Gupta, P.,

Shi, Y., Sawyer, V., Munchak, L. A., Zhou, Y., Kim,

M., et al. (2020). The dark target algorithm for ob-

serving the global aerosol system: Past, present, and

future. Remote sensing, 12(18):2900.

Schl

apfer, D., Borel, C. C., Keller, J., and Itten, K. I.

(1998). Atmospheric precorrected differential absorp-

tion technique to retrieve columnar water vapor. Re-

mote Sensing of Environment, 65(3):353–366.

Wang, Y., Hong, D., Sha, J., Gao, L., Liu, L., Zhang,

Y., and Rong, X. (2022). Spectral–spatial–temporal

transformers for hyperspectral image change detec-

tion. IEEE Transactions on Geoscience and Remote

Sensing, 60:1–14.

Zhang, H., Ma, Y., Zhang, J., Zhao, X., Zhang, X., and

Leng, Z. (2022). Atmospheric correction model for

water–land boundary adjacency effects in landsat-8

multispectral images and its impact on bathymetric re-

mote sensing. Remote Sensing, 14(19):4769.

Zhao, X., Ma, Y., Xiao, Y., Liu, J., Ding, J., Ye, X., and Liu,

R. (2023). Atmospheric correction algorithm based on

deep learning with spatial-spectral feature constraints

for broadband optical satellites: Examples from the

hy-1c coastal zone imager. ISPRS Journal of Pho-

togrammetry and Remote Sensing, 205:147–162.

Zhu, W. and Xia, W. (2023). Effects of atmospheric cor-

rection on remote sensing statistical inference in an

aquatic environment. Remote Sensing, 15(7):1907.

Atmospheric Correction of Sentinel-2 Images Using Deep Learning

185