Authors:
Mohamed Ibrahim
1
;
2
;
Robert Benavente
2
;
Daniel Ponsa
2
and
Felipe Lumbreras
2
Affiliations:
1
Computer Engineering Department, Arab Academy for Science, Technology and Maritime Transport, Alexandria, Egypt
;
2
Computer Vision Center & Computer Science Department, Universitat Autònoma de Barcelona, Barcelona, Spain
Keyword(s):
Computer Vision, Super-Resolution, Remote Sensing, Deep Learning.
Abstract:
Remote sensing applications, impacted by acquisition season and sensor variety, require high-resolution images. Transformer-based models improve satellite image super-resolution but are less effective than convolutional neural networks (CNNs) at extracting local details, crucial for image clarity. This paper introduces SWViT-RRDB, a new deep learning model for satellite imagery super-resolution. The SWViT-RRDB, combining transformer with convolution and attention blocks, overcomes the limitations of existing models by better representing small objects in satellite images. In this model, a pipeline of residual fusion group (RFG) blocks is used to combine the multi-headed self-attention (MSA) with residual in residual dense block (RRDB). This combines global and local image data for better super-resolution. Additionally, an overlapping cross-attention block (OCAB) is used to enhance fusion and allow interaction between neighboring pixels to maintain long-range pixel dependencies across
the image. The SWViT-RRDB model and its larger variants outperform state-of-the-art (SoTA) models on two different satellite datasets in terms of PSNR and SSIM.
(More)