Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural

Representations

Bohdan Perederei

1 a

and Faisal Z. Qureshi

2 b

Department of Applied Mathematics, National Technical University of Ukraine ”Igor Sikorsky Kyiv Polytechnic Institute”,

Prospect Beresteiskyi 37, Kyiv, Ukraine

Faculty of Science, Ontario Tech University, 2000 Simcoe St North, Oshawa, Canada

Keywords:

Hyperspectral Imagery, Super-Resolution, Deep Learning, Implicit Neural Representations, Convolutional

Autoencoder, Super-Resolution Loss Functions.

Abstract:

Hyperspectral image super-resolution is a crucial task in computer vision, aiming to enhance the spatial reso-

lution of hyperspectral data while maintaining spectral ﬁdelity. In this paper, we introduce highlights and out-

comes of our research, in which we developed, explored, and evaluated different techniques and methods based

on Implicit Neural Representations (INRs) for conducting Single Hyperspectral Image Super-Resolution. De-

spite the potential of INRs, their application to hyperspectral image super-resolution still needs to be explored,

with signiﬁcant room for further investigation. Our primary goal was to adapt strategies and techniques from

models originally developed for multispectral image super-resolution, especially SIREN-based INRs and the

Dual Interactive Implicit Neural Network architectures. We also explored feature extraction from hyper-

spectral images using a convolutional neural network autoencoder that allowed us to capture spatial-spectral

patterns for further enhancement. Furthermore, as a part of the research, we validated and compared different

functions, such as MSE, RMSE, MAE, PSNR, SAD, SAM, and SSIM, to evaluate their effectiveness as loss

functions for training INRs.

1 INTRODUCTION

Being a challenging and ill-posed problem, single im-

age super-resolution (SISR) is one of the fundamental

computer vision problems aimed at generating a high-

resolution image from a low-resolution input. The

main reason for conducting SISR is to improve image

representation for better human and machine inter-

pretation. Considering multispectral imagery super-

resolution (e.g., RGB), various deep-learning tech-

niques could provide high-quality or state-of-the-art

results for super-resolution tasks (Wang et al., 2019).

As for the super-resolution of hyperspectral im-

agery (HSI), it is an especially signiﬁcant area of fo-

cus for the researchers since hyperspectral cameras

aim at capturing the scene in various spectral bands,

but due to limited hardware capabilities and ﬁnan-

cial resources, they usually have reduced spatial res-

olution compared to multispectral images. As a re-

sult, low spatial resolution and high camera prices

lead to a scarcity of HSI data, which can signiﬁcantly

https://orcid.org/0009-0006-9639-683X

https://orcid.org/0000-0002-8992-3607

limit some deep learning approaches due to insufﬁ-

cient training data. Another reason why this topic

is valuable for exploration is that HSI data presents

unique challenges compared to multispectral imagery.

HSI can have from several dozen to several hundred

spectral bands, leading to very high-dimensional data

that results in issues such as high computational load,

greater sensitivity to noise, and the so-called curse

of dimensionality, which makes a considerable num-

ber of deep learning techniques (Chen et al., 2021a;

Nguyen and Beksi, 2023) unusable or as such that re-

quire signiﬁcant optimizations.

Furthermore, in addition to SISR, there is an-

other widely used approach for enhancing the spa-

tial resolution of HSI - fusion-based hyperspectral

image super-resolution. The core concept of this

method is to improve the quality of upscaled hyper-

spectral imagery by combining it with additional data,

such as RGB or multispectral images. In experi-

mental settings, such methods often deliver better re-

sults than techniques relying solely on single image

super-resolution. However, a common assumption

among these methods is that the low-resolution hyper-

628

Perederei, B. and Qureshi, F. Z.

Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations.

DOI: 10.5220/0013153900003905

In Proceedings of the 14th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2025), pages 628-635

ISBN: 978-989-758-730-6; ISSN: 2184-4313

spectral and high-resolution auxiliary images are pre-

cisely aligned. In practical scenarios, capturing a low-

resolution hyperspectral image and a high-resolution

multispectral image often involves different cameras.

This results in minor variations in the imaging condi-

tions, which complicates the achievement of accurate

image registration. Thus, our research primarily con-

centrated on the single image super-resolution task.

Our primary contribution is the analysis, adapta-

tion, and evaluation of various INR architectures for

hyperspectral SISR, including SIREN, SIREN with

bicubic interpolation, and the Implicit Decoder from

the Dual-Interactive Implicit Neural Network paired

with a custom CNN autoencoder. Additionally, we in-

vestigated loss functions to determine their effective-

ness in training INRs, identifying PSNR as the most

effective for optimal convergence and metrics in our

experiments.

The paper is structured as follows: The Related

Work section reviews hyperspectral SISR methods,

including traditional, deep learning, and INR-based

approaches for multispectral and hyperspectral im-

agery. The Methodology section outlines the ratio-

nale for using INRs and details the metrics, loss func-

tions, and model architectures. The Experiments, Re-

sults, and Discussion section describes the experi-

mental setup, presents the results, and analyzes the

ﬁndings.

2 RELATED WORK

This section provides a concise overview of the meth-

ods deﬁned in the literature. Because of the rea-

sons mentioned in the Introduction section, fusion-

based hyperspectral image super-resolution is out of

the scope of this work. In addition, it is essential to

note that this work prioritizes learning deterministic

mappings over stochastic approaches, such as those

employed in the generative models.

2.1 Conventional Approaches

The ﬁrst methods that come to mind regarding im-

age upscaling are classic and well-known image in-

terpolation techniques, such as nearest-neighbor, bi-

linear, and bicubic interpolation. They are straight-

forward to implement, provide efﬁcient image up-

scaling, and demonstrate stability in quality across

data dimensionalities. However, super-resolution ma-

chine learning-based methods, such as deep learning

models based on convolutional neural networks, are

usually higher on metrics as they can generate much

higher-quality results, preserving details and textures

of low-scale images. Hence, researchers use these in-

terpolation methods as baseline approaches.

As for other conventional SISR techniques, (Wang

et al., 2017) proposed a method for HSI super-

resolution based on nonlocal low-rank tensor ap-

proximation and total variation regularization. This

approach effectively preserves structural details and

reduces noise by leveraging low-rank priors, but

it requires solving computationally expensive opti-

mization problems, making it less efﬁcient for large

datasets. Similarly, (Li et al., 2016) introduced a tech-

nique that combines spectral mixture analysis with

spatial-spectral group sparsity to capture the under-

lying spatial and spectral correlations. While this

method improves accuracy by promoting sparsity,

it also demands signiﬁcant computational resources.

It relies on carefully designed, hand-crafted priors,

which may not be generalized well in real-world sce-

narios.

2.2 Deep Learning Approaches

In contrast to conventional techniques, deep learning-

based methods, despite being data-demanding, of-

fer a more ﬂexible and scalable approach by auto-

matically learning features, often outperforming tra-

ditional methods in speed and accuracy. The most

widely used model architectures for hyperspectral

SISR and multispectral SISR are mainly based on

CNN layers, and all the papers on this topic demon-

strated that network architecture design is a crucial

factor in image reconstruction quality (Mei et al.,

2017; Arun et al., 2020; Jiang et al., 2020). How-

ever, because hyperspectral data contains hundreds

of channels along the spectral dimension, its com-

plex 3D nature makes hyperspectral imaging unsuit-

able for SISR techniques applied to natural images,

requiring researchers to adapt or create novel meth-

ods.

At this point, researchers have introduced a vast

range of model architectures. Three main categories

of models can be generally classiﬁed based on their

upsampling techniques and the placement of upsam-

pling layers within the model architecture: back-

end upsampling (after CNN layers), front-end up-

sampling (before CNN layers), and progressive up-

sampling (between CNN layers). Typically, the most

challenging upsampling step is performed utilizing

traditional techniques like bicubic interpolation, with

deep neural networks responsible only for reﬁning

these interpolated images to restore ﬁne details and

achieve higher quality. Furthermore, because conven-

tional interpolation-based upsampling methods can-

not incorporate external prior information and are

Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations

629

unsuitable for use as an upsampling layer in the

back-end upsampling structure, researchers have in-

troduced learning-based upsampling methods, such as

transposed convolution (Dong et al., 2016) and pixel

shufﬂe (Shi et al., 2016) as alternative techniques.

As for the structures of networks, the most com-

mon designs usually include recursive learning, resid-

ual learning, multi-path learning, dense connections,

and attention mechanism. As research into neural

networks advances, more and more network architec-

tures are being designed and utilized for SISR task.

2.3 Implicit Neural Representations for

Multispectral SISR

Implicit neural representation commonly encodes an

object using a multi-layer perceptron (MLP) that as-

sociates spatial coordinates with a signal. INR was

found to be especially useful in 3D object represen-

tations. As a result, through extensive research, tra-

ditional discrete models of 3D object shapes, sur-

faces, and scene structures have been replaced by

continuous functions deﬁned by MLPs. One such

model we heavily researched for the SISR problem

is SIREN (dubbed sinusoidal representation network)

(Sitzmann et al., 2020), which usually showed bet-

ter 2D images and 3D object reconstruction results

than other INR models with different architectures

and activation functions. However, researchers fo-

cused on INR exploration mostly applied them to 3D

computer vision and object representation (Sitzmann

et al., 2020; Mildenhall et al., 2021; Wang et al.,

2021) often ignoring 2D imaging, which resulted in

under-exploration of this domain.

Nonetheless, research into INR techniques for 2D

imagery has progressed, leading to investigations on

how INR can be effectively applied to various com-

puter vision tasks, including SISR. Implicit Neural

Representations of 2D images can be directly applied

to SISR because they enable the sampling of pixel val-

ues at any spatial location. For instance, (Chen et al.,

2021b) in their paper proposed the method called Lo-

cal Implicit Image Function (LIIF), which determines

a pixel’s value by referencing the closest latent code,

consisting of a localized collection of adjacent fea-

ture vectors. This pixel-based approach facilitates a

seamless transition across different areas in the recon-

structed image. Drawing inspiration from LIIF, (Tang

et al., 2021) introduced a new INR-driven represen-

tation called the Joint Implicit Image Function (JIIF)

for guided depth super-resolution, which aims to learn

the interpolation weights and their corresponding val-

ues simultaneously.

Based on insights provided by JIIF and LIIF pa-

pers and assuming that simple concatenation on spa-

tial encodings and coordinates cannot fully improve

the quality of the output images, the Meta-SR pa-

per (Hu et al., 2019) introduced a magniﬁcation-

arbitrary network that leverages INR techniques to

perform super-resolution across a range of scaling

factors. Also, unlike the approach in LIIF, (Nguyen

and Beksi, 2023) proposed a novel Dual Interactive

Implicit Neural Network (DIINN). DIINN consists

of a well-known Residual Dense Network encoder

(Zhang et al., 2018) and a unique Implicit Decoder

that itself includes Modulation and Synthesis net-

works to enhance the implicit decoding function by

separating the content and positional features at the

pixel level, as suggested by (Mehta et al., 2021).

2.4 Implicit Neural Representations for

Hyperspectral SISR

Different INR techniques and architectures in Sub-

section 2.3 demonstrate that they are frequently and

effectively applied to address SISR problems in the

multispectral imagery domain. While research on

INR for multispectral SISR is advancing and yielding

promising outcomes, the application of INRs for hy-

perspectral SISR remains underexplored. The reasons

for that are general trends in HSI super-resolution re-

search, which primarily utilizes methods and tech-

niques mentioned in Subsection 2.2, and additional

challenges that are presented by HSI data, such as the

curse of dimensionality and high computational load.

Nonetheless, some progress has already been

made in the research of INR for HSI super-resolution,

and optimistic results have been demonstrated. For

instance, (Zhang et al., 2022) addressed the chal-

lenges of high-dimensional spectral patterns in HSI

super-resolution without relying on auxiliary images,

introducing a novel model that utilizes INR to map

spatial coordinates to their corresponding spectral

values through continuous functions, enhanced by a

hypernetwork for INR parameter prediction. Eval-

uations on multiple datasets demonstrated that this

approach yields competitive reconstruction perfor-

mance, highlighting the model’s capability to recover

high-frequency details effectively.

Furthermore, (Chen et al., 2023) introduced a

novel approach called Spectral-wise Implicit Neu-

ral Representation (SINR) that addressed the limi-

tations of traditional methods in HSI reconstruction,

which often results in a worse representation of spec-

tral information continuity. SINR employs a contin-

uous spectral ampliﬁcation process and incorporates

a spectral-wise attention mechanism, treating indi-

vidual channels as distinct tokens to capture global

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

630

spectral dependencies effectively. Even though SINR

primarily targets HSI reconstruction, it can be eas-

ily adapted for SISR tasks. Extensive experiments

demonstrated that this framework outperforms base-

line methods, signiﬁcantly enhancing ﬂexibility and

performance by accommodating unlimited spectral

bands in the output.

3 METHODOLOGY

This research aims to propose and evaluate novel ap-

proaches based on INRs for HSI super-resolution.

The rationale for employing INR-based methods

aligns closely with their established beneﬁts in 3D ob-

ject reconstruction and 2D multispectral image super-

resolution.

Hyperspectral imagery, like 3D discrete models,

is inherently data-intensive, which highlights the po-

tential of INRs for HSI reconstruction and super-

resolution, as they can effectively manage the high

dimensionality and complexity associated with such

data forms. In addition to this capability, INRs offer

signiﬁcant ﬂexibility in resolution due to their contin-

uous nature. For instance, when utilizing a SIREN

architecture for image ﬁtting, it becomes straightfor-

ward to upscale an image to any desired resolution. A

high-resolution image can be obtained without requir-

ing extensive re-training simply by creating a larger

pixel grid and inputting it into the trained SIREN

model. Furthermore, when applied to hyperspectral

data, INRs can enhance image representation efﬁ-

ciency, as the trained weights of an INR model are

often signiﬁcantly smaller than the original full HSI

image, enabling effective storage and transmission.

Despite these advantages, INR-based methods remain

relatively underexplored in HSI super-resolution, pre-

senting a valuable opportunity for further research

to fully leverage their capabilities in HSI processing

tasks.

During our research, we systematically explored

various increasingly complex approaches to achieve

super-resolution metrics that surpass those of inter-

polation methods, which served as our baseline. The

details and outcomes of these experiments will be pre-

sented in Section 4. To prevent redundant efforts,

we concentrated on leveraging techniques previously

demonstrated to be effective for multispectral image

reconstruction and SISR that utilize INRs. Notably,

we employed ideas and techniques from such net-

works as SIREN (Sitzmann et al., 2020), LIIF (Tang

et al., 2021), and DIINN (Nguyen and Beksi, 2023),

all of which have shown promising results with mul-

tispectral data.

3.1 Evaluation Metrics and Loss

Functions

To assess the effectiveness of HSI super-resolution

techniques, we rely on such commonly used evalua-

tion metrics as Mean Square Error (MSE), Root Mean

Square Error (RMSE), Peak Signal-to-Noise Ratio

(PSNR), Sum Of Absolute Differences (SAD), Spec-

tral Angle Mapper (SAM), and Structural Similarity

(SSIM). MSE quantiﬁes the average squared differ-

ences between estimated and actual pixel values:

MSE(X,

X) =

∑

i=1

−

)

. (1)

As for PSNR, it is a widely used metric for evaluating

image quality. It is determined by the maximum pixel

value (L, in our case equals 1) in the image and the

MSE calculated between the original high-resolution

HSI and its reconstructed counterpart:

PSNR(X,

X) = 10 log



MSE



(2)

= 10log

W ×H

∑

W ×H

i=1

−

)

(3)

The SAM function, introduced by (Kruse et al.,

1993), assesses the spectral similarity between pix-

els in hyperspectral images by calculating the angle

between their spectral vectors, where a smaller angle

indicates a higher likelihood of the pixels belonging

to the same class:

SAM(X,

X) = arccos



⟨X

X⟩

∥X∥

· ∥

X∥



. (4)

Regarding SSIM index (Wang et al., 2004), it mea-

sures the structural similarity between an original im-

age and a reconstructed image, considering image

degradation as a perceived change in structural infor-

mation:

SSIM(X,

X) =

(2µ

+ c

)(2σ

+ c

)

(µ

+ µ

+ c

)(σ

+ σ

+ c

)

. (5)

Here, µ represents the mean value, while σ indicates

the variance or covariance. The variables c

and c

are

introduced to stabilize the division operation when the

denominator is small.

Moreover, functions of presented evaluation met-

rics and various weighted combinations of them were

assessed for their suitability as loss functions in this

study. Mean absolute error (MAE) was also tested as

a potential loss function. The results of these evalua-

tions can be found in Section 4.2. PSNR achieved the

best convergence time and metric values in our tests,

surpassing other functions and becoming the primary

loss function for model training.

Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations

631

3.2 Explored Methods and

Architectures

The SIREN architecture (three hidden layers with 256

neurons each) was selected as an initial approach, us-

ing pixel coordinates as inputs and spectral channels

as outputs. The primary objective was to train the net-

work to reconstruct HSI and perform super-resolution

by feeding the model an enlarged pixel grid. How-

ever, the baseline SIREN architecture did not achieve

performance metrics superior to bicubic interpolation.

To address this, we experimented with an alterna-

tive method where the image was ﬁrst upscaled us-

ing bicubic interpolation, followed by enhancement

through the trained SIREN model. Speciﬁcally, we

downscaled the original image, upscaled it back using

bicubic interpolation, and trained the SIREN to min-

imize the loss between the original and the syntheti-

cally downscaled and upscaled image. Once trained,

the model was applied to the bicubically upscaled

original image to enhance its quality further. Despite

these efforts, this technique failed to produce results

that outperformed the standard bicubic interpolation.

Following this, we shifted our focus to more com-

plex architectures, carefully considering using au-

toencoders to represent HSI in latent space, which

could then be fed into the network instead of the raw

image. The starting point for exploring more ad-

vanced architectures was DIINN (Nguyen and Beksi,

2023), which consists of a Residual Dense Network

(RDN) encoder (Zhang et al., 2018) and a novel Im-

plicit Decoder composed of modulation and synthesis

networks. Unsurprisingly, DIINN, initially designed

for upscaling RGB images, failed to work without

modiﬁcations, primarily due to the limitations of the

RDN encoder. The latent space produced by the RDN

from the DIINN network is smaller than the number

of spectral bands in HSI, and adjusting the RDN to

handle the increased spectral channels proved chal-

lenging due to the curse of dimensionality. This re-

sulted in excessive memory usage and computational

demands, making it challenging to employ the RDN

as an encoder in any network architecture without a

signiﬁcant redesign.

Due to the necessity of utilizing latent space over

the entire image and the limitations of employing

RDN, we opted for a custom autoencoder to encode

the image into latent space, enhancing the spectral

resolution. We chose two CNN layers with a kernel

size of 3×3 for the autoencoder architecture. These

layers process 4x4 image patches with three overlap-

ping pixels, expanding their spectral representation to

256 and then to 512 channels. Consequently, each

4x4 image patch is transformed into a 1×1×512 vector

in the latent space. This latent representation is then

fed into the Implicit Decoder, replacing the RDN-

based approach. Unfortunately, this approach also

failed to achieve metrics surpassing those of interpo-

lation methods.

4 EXPERIMENTS, RESULTS AND

DISCUSSION

4.1 Datasets and Experimental Settings

For training and initial evaluations, we utilized the

Cuprite dataset, consisting of a hyperspectral image

with dimensions of 512×614 pixels and 188 channels

(NASA Jet Propulsion Laboratory, 1997). To further

enhance our evaluations and facilitate comparisons

with other studies, we also incorporated the Chiku-

sei dataset, captured using the Headwall Hyperspec-

VNIR-C imaging sensor over agricultural and ur-

ban regions in Chikusei, Ibaraki, Japan (Yokoya and

Iwasaki, 2016). The HSI in the Chikusei dataset con-

sists of 2517×2335 pixels with 128 bands. For the

Chikusei dataset, we adopted the approach used in

(Jiang et al., 2020) and (Zhang et al., 2023) by extract-

ing non-overlapping patches of 512×512 pixels for

evaluation and benchmarking purposes. It is essen-

tial to highlight that all spectral values in the Chikusei

dataset were normalized to a range between 0 and 1.

For model training and evaluation, the hyperspectral

images serve as ground truth, while the input data is

generated by downscaling these images using bicubic

interpolation at the desired scaling factor.

As for experimental settings, we designed our

INR-based models by incorporating and adapting ele-

ments from the SIREN and DIINN architectures. The

models were trained from the ground up in PyTorch

on Nvidia V100-SXM2 GPUs. Adam with 1e-4 rate

was applied as an optimizer (Kingma, 2014), PSNR

was used as a loss function, and models were trained

for 1000 epochs. The training conﬁgurations were

consistent across all datasets and models, with min-

imal speciﬁc adjustments made.

4.2 Loss Functions Evaluation

Our research analyzed several loss functions (MSE,

RMSE, MSE, PSNR, SAD, SAM, and SSIM) to iden-

tify which makes the training process converge more

rapidly and produce superior metrics overall. For

this experiment, we trained the SIREN model for the

HSI reconstruction task using the Cuprite dataset and

four 512×512 patches from the Chikusei dataset. The

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

632

results shown in Tables 1 and 2 indicate that using

PSNR as a loss function yields better metric values af-

ter 1000 epochs of training compared to others. Addi-

tionally, we experimented with various weighted sum

combinations of loss functions (as shown in Table

3) to determine whether this approach could improve

convergence and metric values. However, none of the

tested combinations surpassed the performance of us-

ing the PSNR as a loss function alone. Consequently,

we selected PSNR as the loss function to train all sub-

sequent models in this study.

4.3 Models Evaluation

To enable a more comprehensive comparison, we in-

cluded results for nearest-neighbor and bilinear inter-

polation, alongside bicubic interpolation, in each ta-

ble. The corresponding metric values are presented in

Tables 4 and 5.

We initially tested two SIREN architectures: one

with three hidden layers of 256 neurons each and an-

other with three layers of 512 neurons each. While in-

creasing the number of neurons led to some improve-

ment in the metric values, both models performed

worse than bilinear and bicubic interpolation. How-

ever, they did surpass the nearest-neighbor method on

the Cuprite dataset. This gave us the idea to utilize im-

age latent space representation or any other additional

data that could be passed as input to the network.

Thus, we experimented with a bicubic SIREN,

where the input was not just the coordinate grid, but

for each coordinate, we passed channel values ob-

tained from bicubic interpolation. In the ﬁrst exper-

iment, we downscaled the image we aimed to up-

scale by a factor of 2 and used this downscaled ver-

sion as input data with the original image serving as

the ground truth for training the model. After that,

we fed to the trained model the original image we

wanted to upscale as input. This approach yielded

worse results than the simple SIREN on the Cuprite

dataset but better results on the Chikusei dataset. In

the second experiment, we trained the bicubic SIREN

by feeding it the original image without downscale

during training. For super-resolution, we passed the

image that had been upscaled using bicubic interpola-

tion. Overall, the results across datasets are inconsis-

tent with bicubic SIREN producing similar outcomes

with some minor ﬂuctuations in metric values com-

pared to standard SIREN, which can be attributed to

the differences in the structure of the HSI test data.

Additionally, it is essential to note that this method

can be viewed as bicubic interpolation degraded by

SIREN reconstruction, with SIREN learning primar-

ily to add as little noise as possible rather than en-

hancing the bicubic-interpolated image.

For the model adapted from DIINN, we used a

CNN autoencoder trained for ten epochs with MSE

loss to extract features from the images in an unsu-

pervised manner, which were then passed to the Im-

plicit Decoder as input. Each image was trained with

a separate CNN autoencoder by feeding it 4×4 over-

lapping patches with a stride equal to 3. Despite our

expectations that this architecture would yield strong

results, it failed to outperform interpolation methods

discussed in this study (Tables 4 and 5).

Even though none of the proposed methods man-

aged to outperform interpolation techniques, there is

still signiﬁcant room for further research. For in-

stance, evaluating a CNN autoencoder with SIREN

or adapting the LIIF model as a decoder could be

promising directions for future studies.

5 CONCLUSIONS

Our research provides a broad overview of the cur-

rent landscape in single hyperspectral image super-

resolution, highlighting advancements in deep learn-

ing and implicit neural representations. We pro-

posed and evaluated several INR-based architectures

adapted from those created for the problem of mul-

tispectral image super-resolution and also evaluated

various loss functions that can be used for INR-based

model training. Although our approaches did not out-

perform in terms of metric values interpolation meth-

ods, which were chosen as a baseline, the area of uti-

lizing INR for hyperspectral image super-resolution

remains underexplored, requiring further investiga-

tion. Among the loss functions evaluated, PSNR

demonstrated the best results.

We hope that the ﬁndings from our research will

provide valuable insights to other researchers, en-

abling them to develop more effective methods. By

sharing the results of our study, we aim to help oth-

ers avoid spending time and resources on approaches

we have already tested and found ineffective, allow-

ing them to focus on developing more effective solu-

tions.

ACKNOWLEDGMENTS

This study was conducted as part of the MITACS

Globalink Research Internship. We gratefully ac-

knowledge Ontario Tech University for providing the

software, computational power, and storage resources

that made this research possible.

Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations

633

Table 1: Loss functions comparison for SIREN ﬁtting (3 hidden layers with 256 neurons each) on the Cuprite dataset (1000

epochs). Rows represent loss functions.

MSE RMSE PSNR SAD SAM SSIM

MSE 0.000673 0.025933 31.722861 1141381.5 3.212296 0.780906

RMSE 0.000306 0.017482 35.148163 774198.38 2.144571 0.905351

MAE 0.000400 0.019991 33.983388 840628.69 2.478677 0.872909

PSNR 0.000201 0.014171 36.971917 630810.25 1.728390 0.942259

SAD 0.000411 0.020261 33.866750 854810.25 2.513883 0.869774

SAM 0.034780 0.186494 14.586708 10511805.0 2.804074 0.692118

SSIM 0.000345 0.018585 34.616675 819630.00 2.304268 0.893533

Table 2: Loss functions comparison for SIREN ﬁtting (3 hidden layers with 256 neurons each) on the Chikusei dataset (1000

epochs). The table presents mean values for four 512×512 patches. Rows represent loss functions.

MSE RMSE PSNR SAD SAM SSIM

MSE 0.000587 0.024223 32.315339 521001.98 52.010291 0.839271

RMSE 0.000589 0.024268 32.299372 520834.09 52.115509 0.837885

MAE 0.000606 0.024608 32.178375 462328.36 52.065796 0.800279

PSNR 0.000475 0.021791 33.234386 446634.08 45.458306 0.974262

SAD 0.000606 0.024620 32.174232 462560.37 52.124053 0.800041

SAM 0.003146 0.056088 25.023480 1570795.0 53.684620 0.462910

SSIM 0.000592 0.024328 32.278115 490803.63 52.715745 0.834866

Table 3: Loss functions weighted sum combinations comparison for SIREN ﬁtting (3 hidden layers with 256 neurons each)

on the Cuprite dataset (1000 epochs).

MSE RMSE PSNR SAD SAM SSIM

0.3 MSE + 0.7 PSNR 0.000216 0.014694 36.657299 651320.38 1.808510 0.935802

0.9 MSE + 0.1 SSIM 0.000416 0.020395 33.809437 900755.13 2.528401 0.869610

0.01 PSNR + 0.05 SSIM 0.000205 0.014330 36.875143 637339.88 1.757805 0.940792

2 MSE + 0.01 SAM 0.000627 0.025048 32.024600 1103658.6 3.117343 0.795719

0.01 PSNR + 0.5 SSIM 0.000222 0.014909 36.530963 665096.44 1.804285 0.936313

Table 4: Quantitative evaluations of different approaches for upscaling the Cuprite dataset from 256×308×188 to

512×614×188.

MSE RMSE PSNR SAD SAM SSIM

Nearest-neighbor 0.000399 0.019964 33.995021 782167.69 2.479466 0.884059

Bilinear 0.000184 0.013563 37.353046 569532.38 1.672223 0.936155

Bicubic 0.000160 0.012643 37.963288 530463.00 1.555574 0.949012

SIREN (3 layers 256 neurons) 0.000299 0.017294 35.242156 754232.00 2.131940 0.910444

SIREN (3 layers 512 neurons) 0.000253 0.015913 35.964863 686883.13 1.948931 0.920803

Bicubic SIREN downscale 0.000310 0.017593 35.093361 753831.63 2.159939 0.913557

Bicubic SIREN no downscale 0.000187 0.013693 37.270051 590392.94 1.664128 0.948168

CNN AE + Implicit Decoder 0.003791 0.061569 24.212803 2822800.5 7.854479 0.588787

Table 5: Quantitative evaluations of different approaches for upscaling the Chikusei dataset from 256×256×128 to

512×512×128. The table presents mean values for four 512×512 patches.

MSE RMSE PSNR SAD SAM SSIM

Nearest-neighbor 0.000387 0.019670 34.123814 414811.76 39.404533 0.924318

Bilinear 0.000398 0.019939 34.005960 412121.91 40.700801 0.919221

Bicubic 0.000420 0.020488 33.770147 434430.89 41.718372 0.917627

SIREN (3 layers 256 neurons) 0.000602 0.024530 32.205966 527469.03 52.713975 0.837608

SIREN (3 layers 512 neurons) 0.000589 0.024273 32.297660 519392.61 52.175372 0.838893

Bicubic SIREN downscale 0.000462 0.021488 33.356040 442064.97 44.539143 0.903490

Bicubic SIREN no downscale 0.000589 0.024277 32.296041 520064.68 52.162835 0.838707

CNN AE + Implicit Decoder 0.000541 0.023254 32.669917 501532.94 49.409021 0.842707

ICPRAM 2025 - 14th International Conference on Pattern Recognition Applications and Methods

634

REFERENCES

Arun, P. V., Buddhiraju, K. M., Porwal, A., and Chanussot,

J. (2020). Cnn-based super-resolution of hyperspec-

tral images. IEEE Transactions on Geoscience and

Remote Sensing, 58(9):6106–6121.

Chen, H., Zhao, W., Xu, T., Shi, G., Zhou, S., Liu, P., and

Li, J. (2023). Spectral-wise implicit neural represen-

tation for hyperspectral image reconstruction. IEEE

Transactions on Circuits and Systems for Video Tech-

nology.

Chen, Y., Liu, S., and Wang, X. (2021a). Learning contin-

uous image representation with local implicit image

function. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition (CVPR),

pages 8628–8638.

Chen, Y., Liu, S., and Wang, X. (2021b). Learning con-

tinuous image representation with local implicit im-

age function. In Proceedings of the IEEE/CVF con-

ference on computer vision and pattern recognition,

pages 8628–8638.

Dong, C., Loy, C. C., and Tang, X. (2016). Accelerating

the super-resolution convolutional neural network. In

Computer Vision–ECCV 2016: 14th European Con-

ference, Amsterdam, The Netherlands, October 11-

14, 2016, Proceedings, Part II 14, pages 391–407.

Springer.

Hu, X., Mu, H., Zhang, X., Wang, Z., Tan, T., and Sun, J.

(2019). Meta-sr: A magniﬁcation-arbitrary network

for super-resolution. In Proceedings of the IEEE/CVF

conference on computer vision and pattern recogni-

tion, pages 1575–1584.

Jiang, J., Sun, H., Liu, X., and Ma, J. (2020). Learn-

ing spatial-spectral prior for super-resolution of hy-

perspectral imagery. IEEE Transactions on Compu-

tational Imaging, 6:1082–1096.

Kingma, D. P. (2014). Adam: A method for stochastic op-

timization. arXiv preprint arXiv:1412.6980.

Kruse, F. A., Lefkoff, A., Boardman, y. J., Heidebrecht, K.,

Shapiro, A., Barloon, P., and Goetz, A. (1993). The

spectral image processing system (sips)—interactive

visualization and analysis of imaging spectrometer

data. Remote sensing of environment, 44(2-3):145–

163.

Li, J., Yuan, Q., Shen, H., Meng, X., and Zhang, L. (2016).

Hyperspectral image super-resolution by spectral mix-

ture analysis and spatial–spectral group sparsity. IEEE

Geoscience and Remote Sensing Letters, 13(9):1250–

1254.

Mehta, I., Gharbi, M., Barnes, C., Shechtman, E., Ra-

mamoorthi, R., and Chandraker, M. (2021). Mod-

ulated periodic activations for generalizable local

functional representations. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion, pages 14214–14223.

Mei, S., Yuan, X., Ji, J., Zhang, Y., Wan, S., and Du, Q.

(2017). Hyperspectral image spatial super-resolution

via 3d full convolutional neural network. Remote

Sensing, 9(11):1139.

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,

Ramamoorthi, R., and Ng, R. (2021). Nerf: Repre-

senting scenes as neural radiance ﬁelds for view syn-

thesis. Communications of the ACM, 65(1):99–106.

NASA Jet Propulsion Laboratory (1997). Cuprite hyper-

spectral dataset. https://aviris.jpl.nasa.gov/data/free

data.html.

Nguyen, Q. H. and Beksi, W. J. (2023). Single image super-

resolution via a dual interactive implicit neural net-

work. In Proceedings of the IEEE/CVF Winter Con-

ference on Applications of Computer Vision (WACV),

pages 4936–4945.

Shi, W., Caballero, J., Husz

ar, F., Totz, J., Aitken, A. P.,

Bishop, R., Rueckert, D., and Wang, Z. (2016). Real-

time single image and video super-resolution using an

efﬁcient sub-pixel convolutional neural network. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 1874–1883.

Sitzmann, V., Martel, J., Bergman, A., Lindell, D., and Wet-

zstein, G. (2020). Implicit neural representations with

periodic activation functions. Advances in neural in-

formation processing systems, 33:7462–7473.

Tang, J., Chen, X., and Zeng, G. (2021). Joint implicit

image function for guided depth super-resolution. In

Proceedings of the 29th acm international conference

on multimedia, pages 4390–4399.

Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., and

Wang, W. (2021). Neus: Learning neural implicit sur-

faces by volume rendering for multi-view reconstruc-

tion. arXiv preprint arXiv:2106.10689.

Wang, Y., Chen, X., Han, Z., and He, S. (2017). Hyperspec-

tral image super-resolution via nonlocal low-rank ten-

sor approximation and total variation regularization.

Remote Sensing, 9(12):1286.

Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P.

(2004). Image quality assessment: from error visi-

bility to structural similarity. IEEE transactions on

image processing, 13(4):600–612.

Wang, Z., Chen, J., and Hoi, S. C. H. (2019). Deep learning

for image super-resolution: A survey. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

43:3365–3387.

Yokoya, N. and Iwasaki, A. (2016). Airborne hyperspectral

data over chikusei. Space Appl. Lab., Univ. Tokyo,

Tokyo, Japan, Tech. Rep. SAL-2016-05-27, 5(5):5.

Zhang, K., Zhu, D., Min, X., and Zhai, G. (2022). Implicit

neural representation learning for hyperspectral image

super-resolution. IEEE Transactions on Geoscience

and Remote Sensing, 61:1–12.

Zhang, M., Zhang, C., Zhang, Q., Guo, J., Gao, X., and

Zhang, J. (2023). Essaformer: Efﬁcient transformer

for hyperspectral image super-resolution. In Proceed-

ings of the IEEE/CVF International Conference on

Computer Vision, pages 23073–23084.

Zhang, Y., Tian, Y., Kong, Y., Zhong, B., and Fu, Y. (2018).

Residual dense network for image super-resolution. In

Proceedings of the IEEE conference on computer vi-

sion and pattern recognition, pages 2472–2481.

Single Hyperspectral Image Super-Resolution Utilizing Implicit Neural Representations

635