Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural
Representations
Bohdan Perederei
1 a
and Faisal Z. Qureshi
2 b
1
Department of Applied Mathematics, National Technical University of Ukraine ”Igor Sikorsky Kyiv Polytechnic Institute”,
Prospect Beresteiskyi 37, Kyiv, Ukraine
2
Faculty of Science, Ontario Tech University, 2000 Simcoe St North, Oshawa, Canada
Keywords:
Hyperspectral Imagery, Super-Resolution, Deep Learning, Implicit Neural Representations, Convolutional
Autoencoder, Super-Resolution Loss Functions.
Abstract:
Hyperspectral image super-resolution is a crucial task in computer vision, aiming to enhance the spatial reso-
lution of hyperspectral data while maintaining spectral fidelity. In this paper, we introduce highlights and out-
comes of our research, in which we developed, explored, and evaluated different techniques and methods based
on Implicit Neural Representations (INRs) for conducting Single Hyperspectral Image Super-Resolution. De-
spite the potential of INRs, their application to hyperspectral image super-resolution still needs to be explored,
with significant room for further investigation. Our primary goal was to adapt strategies and techniques from
models originally developed for multispectral image super-resolution, especially SIREN-based INRs and the
Dual Interactive Implicit Neural Network architectures. We also explored feature extraction from hyper-
spectral images using a convolutional neural network autoencoder that allowed us to capture spatial-spectral
patterns for further enhancement. Furthermore, as a part of the research, we validated and compared different
functions, such as MSE, RMSE, MAE, PSNR, SAD, SAM, and SSIM, to evaluate their effectiveness as loss
functions for training INRs.
1 INTRODUCTION
Being a challenging and ill-posed problem, single im-
age super-resolution (SISR) is one of the fundamental
computer vision problems aimed at generating a high-
resolution image from a low-resolution input. The
main reason for conducting SISR is to improve image
representation for better human and machine inter-
pretation. Considering multispectral imagery super-
resolution (e.g., RGB), various deep-learning tech-
niques could provide high-quality or state-of-the-art
results for super-resolution tasks (Wang et al., 2019).
As for the super-resolution of hyperspectral im-
agery (HSI), it is an especially significant area of fo-
cus for the researchers since hyperspectral cameras
aim at capturing the scene in various spectral bands,
but due to limited hardware capabilities and finan-
cial resources, they usually have reduced spatial res-
olution compared to multispectral images. As a re-
sult, low spatial resolution and high camera prices
lead to a scarcity of HSI data, which can significantly
a
https://orcid.org/0009-0006-9639-683X
b
https://orcid.org/0000-0002-8992-3607
limit some deep learning approaches due to insuffi-
cient training data. Another reason why this topic
is valuable for exploration is that HSI data presents
unique challenges compared to multispectral imagery.
HSI can have from several dozen to several hundred
spectral bands, leading to very high-dimensional data
that results in issues such as high computational load,
greater sensitivity to noise, and the so-called curse
of dimensionality, which makes a considerable num-
ber of deep learning techniques (Chen et al., 2021a;
Nguyen and Beksi, 2023) unusable or as such that re-
quire significant optimizations.
Furthermore, in addition to SISR, there is an-
other widely used approach for enhancing the spa-
tial resolution of HSI - fusion-based hyperspectral
image super-resolution. The core concept of this
method is to improve the quality of upscaled hyper-
spectral imagery by combining it with additional data,
such as RGB or multispectral images. In experi-
mental settings, such methods often deliver better re-
sults than techniques relying solely on single image
super-resolution. However, a common assumption
among these methods is that the low-resolution hyper-
628
Perederei, B. and Qureshi, F. Z.
Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations.
DOI: 10.5220/0013153900003905
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 628-635
ISBN: 978-989-758-730-6; ISSN: 2184-4313
Proceedings Copyright © 2025 by SCITEPRESS Science and Technology Publications, Lda.
spectral and high-resolution auxiliary images are pre-
cisely aligned. In practical scenarios, capturing a low-
resolution hyperspectral image and a high-resolution
multispectral image often involves different cameras.
This results in minor variations in the imaging condi-
tions, which complicates the achievement of accurate
image registration. Thus, our research primarily con-
centrated on the single image super-resolution task.
Our primary contribution is the analysis, adapta-
tion, and evaluation of various INR architectures for
hyperspectral SISR, including SIREN, SIREN with
bicubic interpolation, and the Implicit Decoder from
the Dual-Interactive Implicit Neural Network paired
with a custom CNN autoencoder. Additionally, we in-
vestigated loss functions to determine their effective-
ness in training INRs, identifying PSNR as the most
effective for optimal convergence and metrics in our
experiments.
The paper is structured as follows: The Related
Work section reviews hyperspectral SISR methods,
including traditional, deep learning, and INR-based
approaches for multispectral and hyperspectral im-
agery. The Methodology section outlines the ratio-
nale for using INRs and details the metrics, loss func-
tions, and model architectures. The Experiments, Re-
sults, and Discussion section describes the experi-
mental setup, presents the results, and analyzes the
findings.
2 RELATED WORK
This section provides a concise overview of the meth-
ods defined in the literature. Because of the rea-
sons mentioned in the Introduction section, fusion-
based hyperspectral image super-resolution is out of
the scope of this work. In addition, it is essential to
note that this work prioritizes learning deterministic
mappings over stochastic approaches, such as those
employed in the generative models.
2.1 Conventional Approaches
The first methods that come to mind regarding im-
age upscaling are classic and well-known image in-
terpolation techniques, such as nearest-neighbor, bi-
linear, and bicubic interpolation. They are straight-
forward to implement, provide efficient image up-
scaling, and demonstrate stability in quality across
data dimensionalities. However, super-resolution ma-
chine learning-based methods, such as deep learning
models based on convolutional neural networks, are
usually higher on metrics as they can generate much
higher-quality results, preserving details and textures
of low-scale images. Hence, researchers use these in-
terpolation methods as baseline approaches.
As for other conventional SISR techniques, (Wang
et al., 2017) proposed a method for HSI super-
resolution based on nonlocal low-rank tensor ap-
proximation and total variation regularization. This
approach effectively preserves structural details and
reduces noise by leveraging low-rank priors, but
it requires solving computationally expensive opti-
mization problems, making it less efficient for large
datasets. Similarly, (Li et al., 2016) introduced a tech-
nique that combines spectral mixture analysis with
spatial-spectral group sparsity to capture the under-
lying spatial and spectral correlations. While this
method improves accuracy by promoting sparsity,
it also demands significant computational resources.
It relies on carefully designed, hand-crafted priors,
which may not be generalized well in real-world sce-
narios.
2.2 Deep Learning Approaches
In contrast to conventional techniques, deep learning-
based methods, despite being data-demanding, of-
fer a more flexible and scalable approach by auto-
matically learning features, often outperforming tra-
ditional methods in speed and accuracy. The most
widely used model architectures for hyperspectral
SISR and multispectral SISR are mainly based on
CNN layers, and all the papers on this topic demon-
strated that network architecture design is a crucial
factor in image reconstruction quality (Mei et al.,
2017; Arun et al., 2020; Jiang et al., 2020). How-
ever, because hyperspectral data contains hundreds
of channels along the spectral dimension, its com-
plex 3D nature makes hyperspectral imaging unsuit-
able for SISR techniques applied to natural images,
requiring researchers to adapt or create novel meth-
ods.
At this point, researchers have introduced a vast
range of model architectures. Three main categories
of models can be generally classified based on their
upsampling techniques and the placement of upsam-
pling layers within the model architecture: back-
end upsampling (after CNN layers), front-end up-
sampling (before CNN layers), and progressive up-
sampling (between CNN layers). Typically, the most
challenging upsampling step is performed utilizing
traditional techniques like bicubic interpolation, with
deep neural networks responsible only for refining
these interpolated images to restore fine details and
achieve higher quality. Furthermore, because conven-
tional interpolation-based upsampling methods can-
not incorporate external prior information and are
Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations
629
unsuitable for use as an upsampling layer in the
back-end upsampling structure, researchers have in-
troduced learning-based upsampling methods, such as
transposed convolution (Dong et al., 2016) and pixel
shuffle (Shi et al., 2016) as alternative techniques.
As for the structures of networks, the most com-
mon designs usually include recursive learning, resid-
ual learning, multi-path learning, dense connections,
and attention mechanism. As research into neural
networks advances, more and more network architec-
tures are being designed and utilized for SISR task.
2.3 Implicit Neural Representations for
Multispectral SISR
Implicit neural representation commonly encodes an
object using a multi-layer perceptron (MLP) that as-
sociates spatial coordinates with a signal. INR was
found to be especially useful in 3D object represen-
tations. As a result, through extensive research, tra-
ditional discrete models of 3D object shapes, sur-
faces, and scene structures have been replaced by
continuous functions defined by MLPs. One such
model we heavily researched for the SISR problem
is SIREN (dubbed sinusoidal representation network)
(Sitzmann et al., 2020), which usually showed bet-
ter 2D images and 3D object reconstruction results
than other INR models with different architectures
and activation functions. However, researchers fo-
cused on INR exploration mostly applied them to 3D
computer vision and object representation (Sitzmann
et al., 2020; Mildenhall et al., 2021; Wang et al.,
2021) often ignoring 2D imaging, which resulted in
under-exploration of this domain.
Nonetheless, research into INR techniques for 2D
imagery has progressed, leading to investigations on
how INR can be effectively applied to various com-
puter vision tasks, including SISR. Implicit Neural
Representations of 2D images can be directly applied
to SISR because they enable the sampling of pixel val-
ues at any spatial location. For instance, (Chen et al.,
2021b) in their paper proposed the method called Lo-
cal Implicit Image Function (LIIF), which determines
a pixel’s value by referencing the closest latent code,
consisting of a localized collection of adjacent fea-
ture vectors. This pixel-based approach facilitates a
seamless transition across different areas in the recon-
structed image. Drawing inspiration from LIIF, (Tang
et al., 2021) introduced a new INR-driven represen-
tation called the Joint Implicit Image Function (JIIF)
for guided depth super-resolution, which aims to learn
the interpolation weights and their corresponding val-
ues simultaneously.
Based on insights provided by JIIF and LIIF pa-
pers and assuming that simple concatenation on spa-
tial encodings and coordinates cannot fully improve
the quality of the output images, the Meta-SR pa-
per (Hu et al., 2019) introduced a magnification-
arbitrary network that leverages INR techniques to
perform super-resolution across a range of scaling
factors. Also, unlike the approach in LIIF, (Nguyen
and Beksi, 2023) proposed a novel Dual Interactive
Implicit Neural Network (DIINN). DIINN consists
of a well-known Residual Dense Network encoder
(Zhang et al., 2018) and a unique Implicit Decoder
that itself includes Modulation and Synthesis net-
works to enhance the implicit decoding function by
separating the content and positional features at the
pixel level, as suggested by (Mehta et al., 2021).
2.4 Implicit Neural Representations for
Hyperspectral SISR
Different INR techniques and architectures in Sub-
section 2.3 demonstrate that they are frequently and
effectively applied to address SISR problems in the
multispectral imagery domain. While research on
INR for multispectral SISR is advancing and yielding
promising outcomes, the application of INRs for hy-
perspectral SISR remains underexplored. The reasons
for that are general trends in HSI super-resolution re-
search, which primarily utilizes methods and tech-
niques mentioned in Subsection 2.2, and additional
challenges that are presented by HSI data, such as the
curse of dimensionality and high computational load.
Nonetheless, some progress has already been
made in the research of INR for HSI super-resolution,
and optimistic results have been demonstrated. For
instance, (Zhang et al., 2022) addressed the chal-
lenges of high-dimensional spectral patterns in HSI
super-resolution without relying on auxiliary images,
introducing a novel model that utilizes INR to map
spatial coordinates to their corresponding spectral
values through continuous functions, enhanced by a
hypernetwork for INR parameter prediction. Eval-
uations on multiple datasets demonstrated that this
approach yields competitive reconstruction perfor-
mance, highlighting the model’s capability to recover
high-frequency details effectively.
Furthermore, (Chen et al., 2023) introduced a
novel approach called Spectral-wise Implicit Neu-
ral Representation (SINR) that addressed the limi-
tations of traditional methods in HSI reconstruction,
which often results in a worse representation of spec-
tral information continuity. SINR employs a contin-
uous spectral amplification process and incorporates
a spectral-wise attention mechanism, treating indi-
vidual channels as distinct tokens to capture global
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
630
spectral dependencies effectively. Even though SINR
primarily targets HSI reconstruction, it can be eas-
ily adapted for SISR tasks. Extensive experiments
demonstrated that this framework outperforms base-
line methods, significantly enhancing flexibility and
performance by accommodating unlimited spectral
bands in the output.
3 METHODOLOGY
This research aims to propose and evaluate novel ap-
proaches based on INRs for HSI super-resolution.
The rationale for employing INR-based methods
aligns closely with their established benefits in 3D ob-
ject reconstruction and 2D multispectral image super-
resolution.
Hyperspectral imagery, like 3D discrete models,
is inherently data-intensive, which highlights the po-
tential of INRs for HSI reconstruction and super-
resolution, as they can effectively manage the high
dimensionality and complexity associated with such
data forms. In addition to this capability, INRs offer
significant flexibility in resolution due to their contin-
uous nature. For instance, when utilizing a SIREN
architecture for image fitting, it becomes straightfor-
ward to upscale an image to any desired resolution. A
high-resolution image can be obtained without requir-
ing extensive re-training simply by creating a larger
pixel grid and inputting it into the trained SIREN
model. Furthermore, when applied to hyperspectral
data, INRs can enhance image representation effi-
ciency, as the trained weights of an INR model are
often significantly smaller than the original full HSI
image, enabling effective storage and transmission.
Despite these advantages, INR-based methods remain
relatively underexplored in HSI super-resolution, pre-
senting a valuable opportunity for further research
to fully leverage their capabilities in HSI processing
tasks.
During our research, we systematically explored
various increasingly complex approaches to achieve
super-resolution metrics that surpass those of inter-
polation methods, which served as our baseline. The
details and outcomes of these experiments will be pre-
sented in Section 4. To prevent redundant efforts,
we concentrated on leveraging techniques previously
demonstrated to be effective for multispectral image
reconstruction and SISR that utilize INRs. Notably,
we employed ideas and techniques from such net-
works as SIREN (Sitzmann et al., 2020), LIIF (Tang
et al., 2021), and DIINN (Nguyen and Beksi, 2023),
all of which have shown promising results with mul-
tispectral data.
3.1 Evaluation Metrics and Loss
Functions
To assess the effectiveness of HSI super-resolution
techniques, we rely on such commonly used evalua-
tion metrics as Mean Square Error (MSE), Root Mean
Square Error (RMSE), Peak Signal-to-Noise Ratio
(PSNR), Sum Of Absolute Differences (SAD), Spec-
tral Angle Mapper (SAM), and Structural Similarity
(SSIM). MSE quantifies the average squared differ-
ences between estimated and actual pixel values:
MSE(X,
ˆ
X) =
1
N
N
i=1
(X
i
ˆ
X
i
)
2
. (1)
As for PSNR, it is a widely used metric for evaluating
image quality. It is determined by the maximum pixel
value (L, in our case equals 1) in the image and the
MSE calculated between the original high-resolution
HSI and its reconstructed counterpart:
PSNR(X,
ˆ
X) = 10 log
10
L
2
MSE
(2)
= 10log
10
L
2
1
W ×H
W ×H
i=1
(X
i
ˆ
X
i
)
2
!
.
(3)
The SAM function, introduced by (Kruse et al.,
1993), assesses the spectral similarity between pix-
els in hyperspectral images by calculating the angle
between their spectral vectors, where a smaller angle
indicates a higher likelihood of the pixels belonging
to the same class:
SAM(X,
ˆ
X) = arccos
X
T
ˆ
X
X
2
·
ˆ
X
2
. (4)
Regarding SSIM index (Wang et al., 2004), it mea-
sures the structural similarity between an original im-
age and a reconstructed image, considering image
degradation as a perceived change in structural infor-
mation:
SSIM(X,
ˆ
X) =
(2µ
X
µ
ˆ
X
+ c
1
)(2σ
X
ˆ
X
+ c
2
)
(µ
2
X
+ µ
2
ˆ
X
+ c
1
)(σ
2
X
+ σ
2
ˆ
X
+ c
2
)
. (5)
Here, µ represents the mean value, while σ indicates
the variance or covariance. The variables c
1
and c
2
are
introduced to stabilize the division operation when the
denominator is small.
Moreover, functions of presented evaluation met-
rics and various weighted combinations of them were
assessed for their suitability as loss functions in this
study. Mean absolute error (MAE) was also tested as
a potential loss function. The results of these evalua-
tions can be found in Section 4.2. PSNR achieved the
best convergence time and metric values in our tests,
surpassing other functions and becoming the primary
loss function for model training.
Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations
631
3.2 Explored Methods and
Architectures
The SIREN architecture (three hidden layers with 256
neurons each) was selected as an initial approach, us-
ing pixel coordinates as inputs and spectral channels
as outputs. The primary objective was to train the net-
work to reconstruct HSI and perform super-resolution
by feeding the model an enlarged pixel grid. How-
ever, the baseline SIREN architecture did not achieve
performance metrics superior to bicubic interpolation.
To address this, we experimented with an alterna-
tive method where the image was first upscaled us-
ing bicubic interpolation, followed by enhancement
through the trained SIREN model. Specifically, we
downscaled the original image, upscaled it back using
bicubic interpolation, and trained the SIREN to min-
imize the loss between the original and the syntheti-
cally downscaled and upscaled image. Once trained,
the model was applied to the bicubically upscaled
original image to enhance its quality further. Despite
these efforts, this technique failed to produce results
that outperformed the standard bicubic interpolation.
Following this, we shifted our focus to more com-
plex architectures, carefully considering using au-
toencoders to represent HSI in latent space, which
could then be fed into the network instead of the raw
image. The starting point for exploring more ad-
vanced architectures was DIINN (Nguyen and Beksi,
2023), which consists of a Residual Dense Network
(RDN) encoder (Zhang et al., 2018) and a novel Im-
plicit Decoder composed of modulation and synthesis
networks. Unsurprisingly, DIINN, initially designed
for upscaling RGB images, failed to work without
modifications, primarily due to the limitations of the
RDN encoder. The latent space produced by the RDN
from the DIINN network is smaller than the number
of spectral bands in HSI, and adjusting the RDN to
handle the increased spectral channels proved chal-
lenging due to the curse of dimensionality. This re-
sulted in excessive memory usage and computational
demands, making it challenging to employ the RDN
as an encoder in any network architecture without a
significant redesign.
Due to the necessity of utilizing latent space over
the entire image and the limitations of employing
RDN, we opted for a custom autoencoder to encode
the image into latent space, enhancing the spectral
resolution. We chose two CNN layers with a kernel
size of 3×3 for the autoencoder architecture. These
layers process 4x4 image patches with three overlap-
ping pixels, expanding their spectral representation to
256 and then to 512 channels. Consequently, each
4x4 image patch is transformed into a 1×1×512 vector
in the latent space. This latent representation is then
fed into the Implicit Decoder, replacing the RDN-
based approach. Unfortunately, this approach also
failed to achieve metrics surpassing those of interpo-
lation methods.
4 EXPERIMENTS, RESULTS AND
DISCUSSION
4.1 Datasets and Experimental Settings
For training and initial evaluations, we utilized the
Cuprite dataset, consisting of a hyperspectral image
with dimensions of 512×614 pixels and 188 channels
(NASA Jet Propulsion Laboratory, 1997). To further
enhance our evaluations and facilitate comparisons
with other studies, we also incorporated the Chiku-
sei dataset, captured using the Headwall Hyperspec-
VNIR-C imaging sensor over agricultural and ur-
ban regions in Chikusei, Ibaraki, Japan (Yokoya and
Iwasaki, 2016). The HSI in the Chikusei dataset con-
sists of 2517×2335 pixels with 128 bands. For the
Chikusei dataset, we adopted the approach used in
(Jiang et al., 2020) and (Zhang et al., 2023) by extract-
ing non-overlapping patches of 512×512 pixels for
evaluation and benchmarking purposes. It is essen-
tial to highlight that all spectral values in the Chikusei
dataset were normalized to a range between 0 and 1.
For model training and evaluation, the hyperspectral
images serve as ground truth, while the input data is
generated by downscaling these images using bicubic
interpolation at the desired scaling factor.
As for experimental settings, we designed our
INR-based models by incorporating and adapting ele-
ments from the SIREN and DIINN architectures. The
models were trained from the ground up in PyTorch
on Nvidia V100-SXM2 GPUs. Adam with 1e-4 rate
was applied as an optimizer (Kingma, 2014), PSNR
was used as a loss function, and models were trained
for 1000 epochs. The training configurations were
consistent across all datasets and models, with min-
imal specific adjustments made.
4.2 Loss Functions Evaluation
Our research analyzed several loss functions (MSE,
RMSE, MSE, PSNR, SAD, SAM, and SSIM) to iden-
tify which makes the training process converge more
rapidly and produce superior metrics overall. For
this experiment, we trained the SIREN model for the
HSI reconstruction task using the Cuprite dataset and
four 512×512 patches from the Chikusei dataset. The
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
632
results shown in Tables 1 and 2 indicate that using
PSNR as a loss function yields better metric values af-
ter 1000 epochs of training compared to others. Addi-
tionally, we experimented with various weighted sum
combinations of loss functions (as shown in Table
3) to determine whether this approach could improve
convergence and metric values. However, none of the
tested combinations surpassed the performance of us-
ing the PSNR as a loss function alone. Consequently,
we selected PSNR as the loss function to train all sub-
sequent models in this study.
4.3 Models Evaluation
To enable a more comprehensive comparison, we in-
cluded results for nearest-neighbor and bilinear inter-
polation, alongside bicubic interpolation, in each ta-
ble. The corresponding metric values are presented in
Tables 4 and 5.
We initially tested two SIREN architectures: one
with three hidden layers of 256 neurons each and an-
other with three layers of 512 neurons each. While in-
creasing the number of neurons led to some improve-
ment in the metric values, both models performed
worse than bilinear and bicubic interpolation. How-
ever, they did surpass the nearest-neighbor method on
the Cuprite dataset. This gave us the idea to utilize im-
age latent space representation or any other additional
data that could be passed as input to the network.
Thus, we experimented with a bicubic SIREN,
where the input was not just the coordinate grid, but
for each coordinate, we passed channel values ob-
tained from bicubic interpolation. In the first exper-
iment, we downscaled the image we aimed to up-
scale by a factor of 2 and used this downscaled ver-
sion as input data with the original image serving as
the ground truth for training the model. After that,
we fed to the trained model the original image we
wanted to upscale as input. This approach yielded
worse results than the simple SIREN on the Cuprite
dataset but better results on the Chikusei dataset. In
the second experiment, we trained the bicubic SIREN
by feeding it the original image without downscale
during training. For super-resolution, we passed the
image that had been upscaled using bicubic interpola-
tion. Overall, the results across datasets are inconsis-
tent with bicubic SIREN producing similar outcomes
with some minor fluctuations in metric values com-
pared to standard SIREN, which can be attributed to
the differences in the structure of the HSI test data.
Additionally, it is essential to note that this method
can be viewed as bicubic interpolation degraded by
SIREN reconstruction, with SIREN learning primar-
ily to add as little noise as possible rather than en-
hancing the bicubic-interpolated image.
For the model adapted from DIINN, we used a
CNN autoencoder trained for ten epochs with MSE
loss to extract features from the images in an unsu-
pervised manner, which were then passed to the Im-
plicit Decoder as input. Each image was trained with
a separate CNN autoencoder by feeding it 4×4 over-
lapping patches with a stride equal to 3. Despite our
expectations that this architecture would yield strong
results, it failed to outperform interpolation methods
discussed in this study (Tables 4 and 5).
Even though none of the proposed methods man-
aged to outperform interpolation techniques, there is
still significant room for further research. For in-
stance, evaluating a CNN autoencoder with SIREN
or adapting the LIIF model as a decoder could be
promising directions for future studies.
5 CONCLUSIONS
Our research provides a broad overview of the cur-
rent landscape in single hyperspectral image super-
resolution, highlighting advancements in deep learn-
ing and implicit neural representations. We pro-
posed and evaluated several INR-based architectures
adapted from those created for the problem of mul-
tispectral image super-resolution and also evaluated
various loss functions that can be used for INR-based
model training. Although our approaches did not out-
perform in terms of metric values interpolation meth-
ods, which were chosen as a baseline, the area of uti-
lizing INR for hyperspectral image super-resolution
remains underexplored, requiring further investiga-
tion. Among the loss functions evaluated, PSNR
demonstrated the best results.
We hope that the findings from our research will
provide valuable insights to other researchers, en-
abling them to develop more effective methods. By
sharing the results of our study, we aim to help oth-
ers avoid spending time and resources on approaches
we have already tested and found ineffective, allow-
ing them to focus on developing more effective solu-
tions.
ACKNOWLEDGMENTS
This study was conducted as part of the MITACS
Globalink Research Internship. We gratefully ac-
knowledge Ontario Tech University for providing the
software, computational power, and storage resources
that made this research possible.
Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations
633
Table 1: Loss functions comparison for SIREN fitting (3 hidden layers with 256 neurons each) on the Cuprite dataset (1000
epochs). Rows represent loss functions.
MSE RMSE PSNR SAD SAM SSIM
MSE 0.000673 0.025933 31.722861 1141381.5 3.212296 0.780906
RMSE 0.000306 0.017482 35.148163 774198.38 2.144571 0.905351
MAE 0.000400 0.019991 33.983388 840628.69 2.478677 0.872909
PSNR 0.000201 0.014171 36.971917 630810.25 1.728390 0.942259
SAD 0.000411 0.020261 33.866750 854810.25 2.513883 0.869774
SAM 0.034780 0.186494 14.586708 10511805.0 2.804074 0.692118
SSIM 0.000345 0.018585 34.616675 819630.00 2.304268 0.893533
Table 2: Loss functions comparison for SIREN fitting (3 hidden layers with 256 neurons each) on the Chikusei dataset (1000
epochs). The table presents mean values for four 512×512 patches. Rows represent loss functions.
MSE RMSE PSNR SAD SAM SSIM
MSE 0.000587 0.024223 32.315339 521001.98 52.010291 0.839271
RMSE 0.000589 0.024268 32.299372 520834.09 52.115509 0.837885
MAE 0.000606 0.024608 32.178375 462328.36 52.065796 0.800279
PSNR 0.000475 0.021791 33.234386 446634.08 45.458306 0.974262
SAD 0.000606 0.024620 32.174232 462560.37 52.124053 0.800041
SAM 0.003146 0.056088 25.023480 1570795.0 53.684620 0.462910
SSIM 0.000592 0.024328 32.278115 490803.63 52.715745 0.834866
Table 3: Loss functions weighted sum combinations comparison for SIREN fitting (3 hidden layers with 256 neurons each)
on the Cuprite dataset (1000 epochs).
MSE RMSE PSNR SAD SAM SSIM
0.3 MSE + 0.7 PSNR 0.000216 0.014694 36.657299 651320.38 1.808510 0.935802
0.9 MSE + 0.1 SSIM 0.000416 0.020395 33.809437 900755.13 2.528401 0.869610
0.01 PSNR + 0.05 SSIM 0.000205 0.014330 36.875143 637339.88 1.757805 0.940792
2 MSE + 0.01 SAM 0.000627 0.025048 32.024600 1103658.6 3.117343 0.795719
0.01 PSNR + 0.5 SSIM 0.000222 0.014909 36.530963 665096.44 1.804285 0.936313
Table 4: Quantitative evaluations of different approaches for upscaling the Cuprite dataset from 256×308×188 to
512×614×188.
MSE RMSE PSNR SAD SAM SSIM
Nearest-neighbor 0.000399 0.019964 33.995021 782167.69 2.479466 0.884059
Bilinear 0.000184 0.013563 37.353046 569532.38 1.672223 0.936155
Bicubic 0.000160 0.012643 37.963288 530463.00 1.555574 0.949012
SIREN (3 layers 256 neurons) 0.000299 0.017294 35.242156 754232.00 2.131940 0.910444
SIREN (3 layers 512 neurons) 0.000253 0.015913 35.964863 686883.13 1.948931 0.920803
Bicubic SIREN downscale 0.000310 0.017593 35.093361 753831.63 2.159939 0.913557
Bicubic SIREN no downscale 0.000187 0.013693 37.270051 590392.94 1.664128 0.948168
CNN AE + Implicit Decoder 0.003791 0.061569 24.212803 2822800.5 7.854479 0.588787
Table 5: Quantitative evaluations of different approaches for upscaling the Chikusei dataset from 256×256×128 to
512×512×128. The table presents mean values for four 512×512 patches.
MSE RMSE PSNR SAD SAM SSIM
Nearest-neighbor 0.000387 0.019670 34.123814 414811.76 39.404533 0.924318
Bilinear 0.000398 0.019939 34.005960 412121.91 40.700801 0.919221
Bicubic 0.000420 0.020488 33.770147 434430.89 41.718372 0.917627
SIREN (3 layers 256 neurons) 0.000602 0.024530 32.205966 527469.03 52.713975 0.837608
SIREN (3 layers 512 neurons) 0.000589 0.024273 32.297660 519392.61 52.175372 0.838893
Bicubic SIREN downscale 0.000462 0.021488 33.356040 442064.97 44.539143 0.903490
Bicubic SIREN no downscale 0.000589 0.024277 32.296041 520064.68 52.162835 0.838707
CNN AE + Implicit Decoder 0.000541 0.023254 32.669917 501532.94 49.409021 0.842707
ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods
634
REFERENCES
Arun, P. V., Buddhiraju, K. M., Porwal, A., and Chanussot,
J. (2020). Cnn-based super-resolution of hyperspec-
tral images. IEEE Transactions on Geoscience and
Remote Sensing, 58(9):6106–6121.
Chen, H., Zhao, W., Xu, T., Shi, G., Zhou, S., Liu, P., and
Li, J. (2023). Spectral-wise implicit neural represen-
tation for hyperspectral image reconstruction. IEEE
Transactions on Circuits and Systems for Video Tech-
nology.
Chen, Y., Liu, S., and Wang, X. (2021a). Learning contin-
uous image representation with local implicit image
function. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition (CVPR),
pages 8628–8638.
Chen, Y., Liu, S., and Wang, X. (2021b). Learning con-
tinuous image representation with local implicit im-
age function. In Proceedings of the IEEE/CVF con-
ference on computer vision and pattern recognition,
pages 8628–8638.
Dong, C., Loy, C. C., and Tang, X. (2016). Accelerating
the super-resolution convolutional neural network. In
Computer Vision–ECCV 2016: 14th European Con-
ference, Amsterdam, The Netherlands, October 11-
14, 2016, Proceedings, Part II 14, pages 391–407.
Springer.
Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., and Sun, J.
(2019). Meta-sr: A magnification-arbitrary network
for super-resolution. In Proceedings of the IEEE/CVF
conference on computer vision and pattern recogni-
tion, pages 1575–1584.
Jiang, J., Sun, H., Liu, X., and Ma, J. (2020). Learn-
ing spatial-spectral prior for super-resolution of hy-
perspectral imagery. IEEE Transactions on Compu-
tational Imaging, 6:1082–1096.
Kingma, D. P. (2014). Adam: A method for stochastic op-
timization. arXiv preprint arXiv:1412.6980.
Kruse, F. A., Lefkoff, A., Boardman, y. J., Heidebrecht, K.,
Shapiro, A., Barloon, P., and Goetz, A. (1993). The
spectral image processing system (sips)—interactive
visualization and analysis of imaging spectrometer
data. Remote sensing of environment, 44(2-3):145–
163.
Li, J., Yuan, Q., Shen, H., Meng, X., and Zhang, L. (2016).
Hyperspectral image super-resolution by spectral mix-
ture analysis and spatial–spectral group sparsity. IEEE
Geoscience and Remote Sensing Letters, 13(9):1250–
1254.
Mehta, I., Gharbi, M., Barnes, C., Shechtman, E., Ra-
mamoorthi, R., and Chandraker, M. (2021). Mod-
ulated periodic activations for generalizable local
functional representations. In Proceedings of the
IEEE/CVF International Conference on Computer Vi-
sion, pages 14214–14223.
Mei, S., Yuan, X., Ji, J., Zhang, Y., Wan, S., and Du, Q.
(2017). Hyperspectral image spatial super-resolution
via 3d full convolutional neural network. Remote
Sensing, 9(11):1139.
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,
Ramamoorthi, R., and Ng, R. (2021). Nerf: Repre-
senting scenes as neural radiance fields for view syn-
thesis. Communications of the ACM, 65(1):99–106.
NASA Jet Propulsion Laboratory (1997). Cuprite hyper-
spectral dataset. https://aviris.jpl.nasa.gov/data/free
data.html.
Nguyen, Q. H. and Beksi, W. J. (2023). Single image super-
resolution via a dual interactive implicit neural net-
work. In Proceedings of the IEEE/CVF Winter Con-
ference on Applications of Computer Vision (WACV),
pages 4936–4945.
Shi, W., Caballero, J., Husz
´
ar, F., Totz, J., Aitken, A. P.,
Bishop, R., Rueckert, D., and Wang, Z. (2016). Real-
time single image and video super-resolution using an
efficient sub-pixel convolutional neural network. In
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 1874–1883.
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., and Wet-
zstein, G. (2020). Implicit neural representations with
periodic activation functions. Advances in neural in-
formation processing systems, 33:7462–7473.
Tang, J., Chen, X., and Zeng, G. (2021). Joint implicit
image function for guided depth super-resolution. In
Proceedings of the 29th acm international conference
on multimedia, pages 4390–4399.
Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., and
Wang, W. (2021). Neus: Learning neural implicit sur-
faces by volume rendering for multi-view reconstruc-
tion. arXiv preprint arXiv:2106.10689.
Wang, Y., Chen, X., Han, Z., and He, S. (2017). Hyperspec-
tral image super-resolution via nonlocal low-rank ten-
sor approximation and total variation regularization.
Remote Sensing, 9(12):1286.
Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.
(2004). Image quality assessment: from error visi-
bility to structural similarity. IEEE transactions on
image processing, 13(4):600–612.
Wang, Z., Chen, J., and Hoi, S. C. H. (2019). Deep learning
for image super-resolution: A survey. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence,
43:3365–3387.
Yokoya, N. and Iwasaki, A. (2016). Airborne hyperspectral
data over chikusei. Space Appl. Lab., Univ. Tokyo,
Tokyo, Japan, Tech. Rep. SAL-2016-05-27, 5(5):5.
Zhang, K., Zhu, D., Min, X., and Zhai, G. (2022). Implicit
neural representation learning for hyperspectral image
super-resolution. IEEE Transactions on Geoscience
and Remote Sensing, 61:1–12.
Zhang, M., Zhang, C., Zhang, Q., Guo, J., Gao, X., and
Zhang, J. (2023). Essaformer: Efficient transformer
for hyperspectral image super-resolution. In Proceed-
ings of the IEEE/CVF International Conference on
Computer Vision, pages 23073–23084.
Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018).
Residual dense network for image super-resolution. In
Proceedings of the IEEE conference on computer vi-
sion and pattern recognition, pages 2472–2481.
Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations
635