Benchmarking Neural Rendering Approaches for 3D Reconstruction of

Underwater Environments

Salvatore Mario Carota

1,∗ a

, Alessandro Privitera

1,∗ b

, Daniele Di Mauro

2 c

Antonino Furnari

1,2 d

, Giovanni Maria Farinella

1,2 e

and Francesco Ragusa

1,2 f

Department of Mathematics and Computer Science, University of Catania, Viale Andrea Doria, 6, Catania, Italy

Next Vision s.r.l., Viale Andrea Doria, 6, Catania, Italy

{antonino.furnari, giovanni.farinella, francesco.ragusa}@unict.it, {salvatore.carota, prvlsn01s02c351v}@studium.unict.it,

Keywords:

Underwater 3D Reconstruction, Neural Rendering, 3D Gaussian Splatting.

Abstract:

We tackle the problem of 3D reconstruction of underwater scenarios using neural rendering techniques. We

propose a benchmark adopting the SeaThru-NeRF dataset, performing a systematic analysis by comparing

several established methods based on NERF and 3D Gaussian Splatting through a series of experiments. The

results were evaluated both quantitatively, using various 2D and 3D metrics, and qualitatively, through a user

survey assessing the ﬁdelity of the reconstructed images. This serves to provide critical insight into how to

select the optimal techniques for 3D reconstruction of underwater scenarios. The results indicate that, in the

context of this application, among the algorithms tested, NeRF-based methods performed better in both mesh

generation and novel view synthesis than the 3D Gaussian Splatting based methods.

1 INTRODUCTION

3D reconstruction is a classic computer vision task

that has become ubiquitous across various scien-

tiﬁc ﬁelds, including archaeological inspections (De

Reu et al., 2014), biological studies (Correia and

Brito, 2023; Irschick et al., 2022), and architectural

projects (M

unster et al., 2024; Cui et al., 2024).

An area of particular interest, due to its diverse ap-

plications ranging from biological assessment to ar-

chaeological discovery, is underwater 3D reconstruc-

tion, which poses unique challenges due to several

critical differences compared to reconstructing non-

underwater scenes. Image captured in underwater en-

vironments differ signiﬁcantly because of the pres-

ence of water, which alters the behavior of light (Li

et al., 2019; Islam et al., 2020; Zhang and Johnson-

Roberson, 2023; Hou et al., 2020). These differ-

ences include variations in lighting, optical distor-

https://orcid.org/0009-0008-6431-9156

https://orcid.org/0009-0001-3507-8233

https://orcid.org/0000-0002-4286-2050

https://orcid.org/0000-0001-6911-0302

https://orcid.org/0000-0002-6034-0432

https://orcid.org/0000-0002-6368-1910

∗

These authors share ﬁrst authorship.

tions, and limited visibility. Together, these factors

create a complex set of challenges for accurate 3D

reconstruction (Akkaynak and Treibitz, 2019). Over-

coming these challenges is highly beneﬁcial for many

ﬁelds. In underwater heritage conservation, 3D re-

construction enables the inspection of artifacts and

structures without risking damage or compromising

their integrity (Memet, 2008; Perez-Alvaro, 2023).

This technology not only aids in protecting cultural

assets but also allows for their presentation to a wider

audience, such as in virtual museums. Additionally,

underwater environmental sciences can beneﬁt from

advancements in 3D reconstruction technologies to

monitor coral reef health by detecting changes over

time. Detailed 3D models enable marine biologists to

study complex habitats, providing deeper insights into

ecological interactions (Zhang et al., 2023; Adam-

czak et al., 2019; Kaandorp, 1993). 3D reconstruc-

tions of underwater environments can also be utilized

in video games, movies, and virtual and augmented

reality applications to enhance user experiences. A

great example of its usage in the ﬁeld of culture is a

project called ”First Life”

, which enables the visitor

to become a virtual voyager, traveling through sub-

https://www.nhm.ac.uk/discover/news/2015/june/dive-

back-in-time-with-david-attenborough-s-ﬁrst-life.html

766

Carota, S. M., Privitera, A., Di Mauro, D., Furnari, A., Farinella, G. M. and Ragusa, F.

Benchmarking Neural Rendering Approaches for 3D Reconstruction of Underwater Environments.

DOI: 10.5220/0013381200003912

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2025) - Volume 2: VISAPP, pages

766-773

ISBN: 978-989-758-728-3; ISSN: 2184-4321

Acquired Underwater Images 3D Reconstruction

NERF

3D Gaussian Splatting

Figure 1: From a set of acquired images in underwater environments, the task is to reconstructs the 3D model of the environ-

ment.

merged landscapes and creating for themselves a new

perspective on the history of life on Earth. Recently,

several solutions have dealt with the problem of 3D

environment reconstruction, but a big gap remains in

underwater environments reconstruction. Photogram-

metry (Sch

onberger and Frahm, 2016), Neural Radi-

ance Fields (NeRF) (Levy et al., 2023), and 3D Gaus-

sian splatting (3DGS) (Kerbl et al., 2023) techniques

have played an important role in enhancing and intro-

ducing new methods for reconstruction of 3D models.

In this work, we present a benchmark for

3D Reconstruction of underwater environments (see

Figure 1) using the state-of-the-art SeaThru-NeRF

dataset introduced in (Levy et al., 2023). The bench-

mark compares models based on NeRF and 3DGS,

at the same time we test how underwater enhancing

techniques perform as a preprocessing step in the con-

text of neural rendering. The results were evaluated

both quantitatively, using a variety of evaluation met-

rics at both the rendering and mesh generation levels,

and qualitatively, through a user survey. We found

out that NeRF-based models are slightly better suited

for the task compared to 3DGS-based methods, which

nonetheless remain highly promising.

The contributions of this work are: 1) We con-

ducted a systematic analysis of NeRF-based and

3DGS-based methods for underwater environment re-

construction, providing insights into the performance

of the tested methods; 2) We analyzed how enhance-

ment techniques can improve the reconstruction of

3D models; 3) We quantitatively evaluated both new

view synthesis task and 3D mesh reconstruction; 4)

We conducted a qualitative study on the accuracy of

reconstructed 3D models through questionnaires ad-

ministered to a total of 40 subjects.

2 RELATED WORK

Our work builds on prior research in underwater

datasets, underwater image enhancement, 3D recon-

struction, and neural rendering, which will be brieﬂy

described in the following sections.

Underwater Datasets. In literature several dataset

were proposed (Li et al., 2019; Islam et al., 2020;

Zhang and Johnson-Roberson, 2023; Hou et al., 2020;

Akkaynak and Treibitz, 2019), some of them are real,

i.e. capturing real scenes, others are synthetic, i.e. im-

ages are crafted in some way to solve a particular task.

They are used for various purposes, i.e. enhancement,

3d reconstruction, robotics. Among them: UIEB (Li

et al., 2019) consisting of 950 real-world underwa-

ter images with different natural and artiﬁcial light-

ing conditions. UFO-120 (Islam et al., 2020) contains

1,500 paired samples splitted in training and valida-

tion sets and 120 paired samples for benchmark eval-

uation. Each shot is provided with a high-resolution

ground truth version, its distorted low-resolution ver-

sion, and a saliency map mask. BNU (Zhang and

Johnson-Roberson, 2023) includes images captured

in a 1.3m-deep tank and in Lake Erie. The JPEG

images were post-processed and camera poses were

calculated using COLMAP. SUID (Hou et al., 2020),

is a synthetic dataset produced by applying special ef-

fects that simulate underwater conditions in terrestrial

images. SeaThru and the following work from the

same research group SeaThru-NERF (Akkaynak and

Treibitz, 2019; Levy et al., 2023) contains underwa-

ter scenes captured in three different sea with a total

of 29, 20 and 18 images respectively. We chose this

dataset for training the models due to its diverse range

of scenarios and number of images.

Underwater Image Enhancement and Restoration.

Underwater image enhancement is the task to reduce

or remove water effects on images recorded under-

water. WaterGAN (Li et al., 2017) is a color cor-

rection model based on Generative Adversarial Net-

works (GAN). The generator estimates the attenua-

tion, backscatter, and camera characteristics of un-

derwater images. The model is trained with both

underwater and non-underwater images in order to

create synthetic underwater images. In (Cho et al.,

2020) they used GANs for image correction and en-

hancement through the image-translation technique.

Benchmarking Neural Rendering Approaches for 3D Reconstruction of Underwater Environments

767

Figure 2: Images belonging to the Seathru dataset adopted for the proposed benchmark. The dataset contains three different

underwater scenes: Red Sea (left), Caribbean Sea (center) and Paciﬁc Ocean (right).

The model is trained with underwater images in or-

der to capture textures and details of underwater im-

ages, the losses used are reconstruction loss, lapla-

cian loss and perception loss. Furthermore, Semi-

UIR (Huang et al., 2023) is a semi-supervised un-

derwater image restoration framework based on the

mean-teacher model, designed to incorporate unla-

beled data into network training. The student model

learns from labeled data, while the teacher model

guides the training process on unlabeled images by

generating reliable “pseudo-labels.” Experimental re-

sults on both full-reference and no-reference under-

water benchmarks show signiﬁcant improvements in

both quantitative and qualitative performance over

SOTA methods. We used this method for the enhance-

ment preprocessing step.

Multi-view surface reconstruction is the process

that creates a 3D surface starting from a set of im-

ages taken from different angles, exploiting points

correspondences between images to estimate the

shape of an object. Several works adopted volumet-

ric grid methods for reconstructing multi-view sur-

faces (Boent and Pula, 1999; Kutulakos and Seitz,

2000; Laurentini, 1994; Szeliski, 1993; Seitz and

Dyer, 1999). Other works focused on the tech-

niques of cloud point-based techniques (Furukawa

and Ponce, 2009; Galliani et al., 2015; Schoenberger

et al., 2016; Tola et al., 2012). Its inﬂuence has, how-

ever been even more extensive among dense recon-

struction methods: the Poisson surface reconstruc-

tion algorithm (Kazhdan et al., 2006a) along with its

screened version (Kazhdan and Hoppe, 2013). Ma-

chine learning methods adopting deep learning for en-

hancing multi-view surface reconstruction techniques

have also recently been explored (Chen et al., 2019;

Huang et al., 2018; Yao et al., 2018).

Neural Radiance Fields (NeRF) is a pioneering ap-

proach that was initiated by (Mildenhall et al., 2020)

The idea is to use a neural network to implicitly model

a scene from a set of images annotated with a pose.

The model thus learns the behavior of light and the

geometry of a scene enabling the use of such model

to generate novel views. Several variants have been

developed to extend the possibilities of NeRF (Bar-

ron et al., 2021; Alex et al., 2021; Kai et al., 2020;

Sun et al., 2022).

Neural surface reconstruction is a technique that in-

volves using neural networks to learn complex sur-

faces of 3D objects or scenes in a continuous and

highly detailed manner. Various methods have been

proposed for this task, utilizing volumetric grid-based

methods for scene reconstruction (Niemeyer et al.,

2020; Oechsle et al., 2021). The authors of (Lior

et al., 2020) uses Signed Distance Functions (SDF)

to implicitly model surfaces by deﬁning them as the

zero level set of SDF. Similar NeRF-based methods

have further been extended to surface reconstruction,

works such as (Wang et al., 2021; Lior et al., 2021;

Darmon et al., 2022; Fu et al., 2022; Yue et al.,

2022) have moved the frontiers in extending the orig-

inal NeRF framework towards high-ﬁdelity surface

modeling. There are also point cloud-based tech-

niques (Fu et al., 2022; Zhang et al., 2022), which

accomplished good reconstructions taking in input

sparser data points.

3D Gaussian Splatting (3DGS) was proposed

in (Kerbl et al., 2023). The method is a change in

perspective compared to NeRF, 3D scenes are repre-

sented in an explicit way: the scene is modelled as a

collection of 3D Gaussian functions distributed in the

space then they are ”splatted” in 2D in order to match

the set of images, together with the corresponding

cameras calibrated by Structure from Motion, taken

in input. The core of the approach is the optimiza-

tion step, where a dense set of 3D Gaussians accu-

rately representing the scene is created. In addition

to positions and covariance, it also optimizes Spheri-

cal Harmonics coefﬁcients representing color of each

Gaussian to correctly capture the view-dependent ap-

pearance of the scene. The optimization of these

parameters is interleaved with steps that adaptevely

control the density of the Gaussians to better repre-

sent the scene. The optimization takes full advantage

of standard GPU-accelerated frameworks and adds

custom CUDA kernels, following recent best prac-

tices (Alex et al., 2021; Sun et al., 2022). The pro-

jection method implements a tile-based rasterizer for

Gaussian splats inspired by recent software rasteriza-

tion approaches (Lassner and Zollhofer, 2021). The

rasterization pipeline is fully differentiable, and given

the projection to 2D can rasterize anisotropic splats

similar to previous 2D splatting methods (Kopanas

et al., 2021). As for NeRF, various method are im-

proving over the initial, such as Splatfacto-W (Xu

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

768

Figure 3: Original image sample (left) vs. its enhanced version (right).

et al., 2024), which improves results in presence of

unconstrained photo collections. Its key contributions

include latent appearance modeling, efﬁcient tran-

sient object handling, and precise background mod-

eling.

Underwater 3D Reconstruction. Two major tech-

niques are used for 3D reconstruction: image-

based (Levy et al., 2023; Weidner et al., 2017;

Jordt et al., 2016) and laser-scanner-based (Bartolini

et al., 2005). Image-based methods are highly cost-

effective, while laser-scanner-based methods require

expensive equipment and typically take a long time

to acquire data. In particular SeaThru-NeRF (Levy

et al., 2023) is a NeRF-based method speciﬁcally de-

signed for underwater scenes, with the unique capa-

bility of modelling the solid objects present in the

scene and the medium.

3 BENCHMARK

Dataset. We perform our benchmark using the

dataset presented in (Levy et al., 2023). It con-

tains underwater scenes captured in three different

seas (see Figure 2): the Red Sea (Eilat, Israel),

the Caribbean Sea (Curacao), and the Paciﬁc Ocean

(Panama), with a total of 29, 20, and 18 images re-

spectively. The images were acquired as RAW im-

ages using a Nikon D850 SLR camera in a Nauticam

underwater housing with a dome port to avoid refrac-

tions. The images were resized to an average size of

900 × 1400 and white-balanced with a 0.5% clipping

per channel to remove extremely noisy pixels. Fi-

nally, COLMAP (Sch

onberger and Frahm, 2016) was

used to extract the camera poses.

Task. We consider the problem of generate a 3D

model from a set of underwater images, where the

same scene is captured from different viewpoints.

Models. We trained the following models: Neuralan-

gelo (Li et al., 2023), Seathru-NeRF (Levy et al.,

2023), Splatfacto (Kerbl et al., 2023), and Splatfacto-

W (Xu et al., 2024). We also performed a COLMAP

dense reconstruction which is our baseline. We trans-

formed the outputs of each model into 3D meshes

using different methods. For Neuralangelo, we used

the Marching Cubes algorithm (Lorensen and Cline,

1987). SeaThru-NeRF, on the other hand, was pro-

cessed using Poisson Surface Reconstruction (Kazh-

dan et al., 2006b). For Splatfacto and Splatfacto-

W, we employed TSDF (Truncated Signed Distance

Function) implemented in dn-splatter (Turkulainen

et al., 2024). Lastly, COLMAP dense reconstruc-

tion was also processed using Poisson Surface Re-

construction (Kazhdan et al., 2006b). We also evalu-

ated the opportunity to do image enhancement before

using Neural Rendering Models (see Figure 3). The

enhanced dataset was created by applying the SEMI-

UIR algorithm, trained on SUID dataset, to the im-

ages of SEATHRU-NeRF. Results that consider en-

hanced images are indicated with +enh.

We ﬁnally imported these meshes into Blender

to generate renders (see Figure 4) using camera paths

exported directly from Nerfstudio

Quantitative Evaluation. To evaluate the qual-

ity of rendered images, we adopted various metrics:

MUSIQ (Ke et al., 2021) gives a score of the per-

ceived quality of an image-very similar to human

judgments as well as UCIQE (Yang and Sowmya,

2015) and UIQM (Panetta et al., 2016) are specif-

ically designed for quality assessment in underwa-

ter images. We also evaluated the mesh generation

quality using the Hausdorff distance (Cignoni et al.,

1998). In particular, we used the implemented ver-

sion in MeshLab

. This allowed us to make a quan-

titative comparison of the meshes resulting from the

various algorithms with respect to a dense reconstruc-

tion from COLMAP. We computed the metric bidirec-

tionally, obtaining a symmetric version by taking the

maximum value. Differently from the other metrics,

which evaluate 2D results, this distance evaluates the

accuracy of the reconstruction of the 3D geometry of

the environment.

Qualitative Evaluation. To evaluate the quality of

https://www.blender.org/

https://docs.nerf.studio/

https://www.meshlab.net/

Benchmarking Neural Rendering Approaches for 3D Reconstruction of Underwater Environments

769

Figure 4: The image shows 3D model renders (Curac¸ao scene) created by four different algorithms and rendered in Blender

(the green border indicates the ground truth).

renderings obtained by the different reconstruction

models, we designed a survey that has been admin-

istered to 40 people. Participants were asked to rate

each 3D reconstruction on a scale from 1 to 5, where

a score of 1 indicates the lowest quality and 5 repre-

sents the best quality.

4 RESULTS

Table 1 shows the average results derived from cal-

culating the different 2D metrics across all images

rendered from a set of sampled viewpoints along a

camera path, evaluated for each scene and model.

For the metric MUSIQ: NeRF-Based methods lead

the scoreboard with 2 best results (Neuralangelo-

Panama 61.513, SeaThru-NeRF-Readsea 65.612) and

one second best. For the metric UCIQE. NeRF-

Based methods lead the scoreboard with 3 best results

(Neuralangelo-Panama 3.461, Neuralangelo-Readsea

5.769, Neuralangelo-Enh-Curac¸ao 4.62) and 3 second

best. Finally for UIQM: gaussian splatting method

Splatfacto-W+enh is the best performer. We can

state that 2 metrics over 3 show an advantage of

NeRF-based metrics. Table 2 compares the rendered

outputs to a reference image calculating the PSNR

among them. The results show that Seathru-NeRF

and COLMAP are the leading models in terms of

rendering accuracy, Splatfacto-W also performs well

(see Mean column: 15.8329 vs. 15.8203 vs 15.4112).

Here we have a second proof that a NeRF based

method is favorable to gaussian splatting ones. Fi-

nally 3D Mesh distances using Hausdorff distance are

reported in Table 3. Seathru-NeRF achieves excellent

performance because it can produce a good recon-

struction, thanks to the ability to distinguish between

medium and objects. SeaThru-NeRF is closely fol-

lowed by Splatfacto-W+enh (0.0482 vs 0.0715), here,

we can observe how enhancement plays a crucial role.

All the quantitative measures show a clear advan-

tage by NeRF on gaussian splatting, in particular with

Figure 5: Qualitative evaluation results.

SeaThru-NeRF leading the scoreboard for 3D mesh

generation and PSNR results. For the qualitative as-

sessment, from the survey results we noticed that

SeaThru-NeRF consistently outperformed other mod-

els, emerging as the best reconstruction model in two

out of the three scenes. Splatfacto-W + enh. was iden-

tiﬁed as the second-best model overall, also achieving

high ratings across most scenes. This ranking high-

lights the relative strengths of these two models in

producing high-quality renderings for diverse scenes.

Figure 4 shows some examples of the reconstructions

provided to users for evaluation.

These tables collectively provide insight into the

relative strengths of each model. In our view, Seathru-

NeRF and Splatfacto-W achieve superior results pri-

marily due to their specialized features. Seathru-

NeRF is speciﬁcally designed for underwater envi-

ronments, making it particularly effective in handling

underwater visual challenges. Splatfacto-W, beneﬁts

from advanced background modeling and effective

handling of transient objects, both of which enhance

its rendering accuracy and adaptability.

Qualitative results in Figure 5 are coherent with

quantitative ones: SeaThru-NeRF consistently out-

performed other models, emerging as the best re-

construction model in two out of the three scenes.

Splatfacto-W + Enh was identiﬁed as the second-best

model overall, also achieving high ratings across most

scenes. Some caveats: the survey was not random-

ized, thus some bias could be present.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

770

Table 1: MUSIQ (Ke et al., 2021), UCIQE (Yang and Sowmya, 2015) and UIQM (Panetta et al., 2016) evaluation of new view

syntesis, best results in bold, second best underlined. Each value represents the average of multiple rendered image values.

Curac¸ao Panama Redsea

Model MUSIQ UCIQE UIQM MUSIQ UCIQE UIQM MUSIQ UCIQE UIQM

Neuralangelo 46.336 2.411 2.138 61.513 3.461 2.475 49.186 5.769 1.351

Neuralangelo+enh. 50.824 4.62 1.95 58.524 2.232 2.868 56.885 4.367 1.911

SeaThru-NeRF 57.412 4.427 2.145 55.432 0.852 1.927 65.612 0.654 2.021

Splatfacto 59.274 1.896 1.703 49.674 1.391 1.617 61.045 1.38 1.560

Splatfacto-w 61.470 2.836 2.478 55.694 1.123 2.073 64.632 1.069 1.983

Splatfacto +enh. 59.742 1.733 1.843 56.055 1.443 1.623 59.860 1.404 1.567

Splatfacto-w+enh. 64.042 1.940 2.564 58.066 1.185 2.260 63.105 1.099 2.052

Colmap-Poisson 45.350 1.132 1.712 51.647 0.983 1.838 65.289 0.802 1.410

Table 2: Average PSNR, rendered image is compared to

original image or to the original preprocessed when en-

hancement is in place, best results in bold, second best un-

derlined.

Method Curac¸ao Panama Redsea Mean

Neuralangelo 10.6242 11.3654 9.4266 10.4721

Neuralangelo+enh. 10.0941 10.1605 9.7282 9.9943

SeaThru 16.1436 16.7314 14.6238 15.8329

Splactfacto 17.3554 15.1125 10.7050 14.3910

Splactfacto-W 16.4640 16.1713 13.5984 15.4112

Splactfacto+enh. 10.5768 10.2170 9.8972 10.2303

Splactfacto-W+enh. 11.1514 11.3191 11.6711 11.3805

Colmap-Poisson 18.4726 16.4224 12.566 15.8203

Table 3: Mean Normalized Haussdorf distance (Cignoni

et al., 1998) between COLMAP reconstruction and

NERF/3DGS based reconstruction, best results in bold, sec-

ond best underlined.

Method Curac¸ao Panama Redsea Mean

Neuralangelo 0.1538 0.0969 0.1387 0.1298

Neuralangelo+enh. 0.1706 0.0819 0.1379 0.1301

SeaThru-NeRF 0.0287 0.0652 0.0508 0.0482

Splatfacto 0.0906 0.0466 0.1394 0.0922

Splatfacto-W 0.0753 0.0539 0.0919 0.0737

Splatfacto+enh. 0.1347 0.0818 0.1203 0.1123

Splatfacto-W+enh. 0.0809 0.0517 0.0818 0.0715

Colmap-Poisson 0 0 0 0

5 CONCLUSION

In this work, we presented a benchmark to perform

3D reconstruction of underwater scenes using Neural

Rendering techniques. Quantitative analysis shows

promising results, Neural Rendering model are on par

with SfM only when they take care of modeling the

medium (SeaThru-NeRF) or when they take care to

model different camera settings (Splatfacto-W). In fu-

ture work we will focus on improving 3DGS with

medium modelling similar to Seathru-NeRF.

ACKNOWLEDGEMENTS

This research has been supported by Next Vi-

sion s.r.l.

and by the project Neural Rendering &

Edge AI Platform for 4D synthetic Twins generation

during Underwater Navigation & Exploration (NEP-

TUNE) PNRR MUR Project CUP J53D23020140005

COR 18115262 - Spoke 3 Robotics and AI for Socio-

economic Empowerment (RAISE).

REFERENCES

Adamczak, S. K., Pabst, A., McLellan, W. A., and Thorne,

L. H. (2019). Using 3d models to improve estimates of

marine mammal size and external morphology. Fron-

tiers in Marine Science, 6.

Akkaynak, D. and Treibitz, T. (2019). Sea-thru: A method

for removing water from underwater images. In Pro-

ceedings of the IEEE/CVF conference on computer vi-

sion and pattern recognition, pages 1682–1691.

Alex, Y., Fridovish-Keil, S., Matthew, T., Qinhong, C., Ben-

jamin, R., and Angjoo, K. (2021). Plenoxels: Radi-

ance ﬁelds without neural networks. arXiv preprint

arXiv:2112.05131.

Barron, J. T., Mildenhall, B., Tancik, M., Hedman, P.,

Martin-Brualla, R., and Srinivasan, P. P. (2021). Mip-

nerf: A multiscale representation for anti-aliasing

neural radiance ﬁelds. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion, pages 5855–5864.

Bartolini, L., De Dominicis, L., Ferri de Collibus, M., For-

netti, G., Guarneri, M., Paglia, E., Poggi, C., and

Ricci, R. (2005). Underwater three-dimensional imag-

ing with an amplitude-modulated laser radar at a 405

nm wavelength. Applied optics, 44(33):7130–7135.

Boent, J. S. and Pula, P. (1999). Probabilistic voxelized vol-

ume reconstruction. In Proceedings of International

Conference on Computer Vision (ICCV), volume 2.

Chen, R., Han, S., Xu, J., and Su, H. (2019). Point-based

multi-view stereo network. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion, pages 1538–1547.

https://www.nextvisionlab.it/

Benchmarking Neural Rendering Approaches for 3D Reconstruction of Underwater Environments

771

Cho, Y., Jang, H., Malav, R., Pandey, G., and Kim, A.

(2020). Underwater image dehazing via unpaired

image-to-image translation. International Journal of

Control, Automation and Systems, 18:605–614.

Cignoni, P., Rocchini, C., and Scopigno, R. (1998). Metro:

measuring error on simpliﬁed surfaces. In Computer

Graphics Forum, volume 17, pages 167–174. Black-

well Publishers.

Correia, H. A. and Brito, J. H. (2023). 3d reconstruction

of human bodies from single-view and multi-view im-

ages: A systematic review. Computer Methods and

Programs in Biomedicine, 239:107620.

Cui, D., Wang, W., Hu, W., Peng, J., Zhao, Y., Zhang,

Y., and Wang, J. (2024). 3d reconstruction of build-

ing structures incorporating neural radiation ﬁelds and

geometric constraints. Automation in Construction,

165:105517.

Darmon, F., Basc

ole, B., Devaux, J.-C., Souhila, P., and

Aubry, M. (2022). Improving neural implicit surfaces

geometry with patch warping. In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition, pages 6260–6269.

De Reu, J., De Smedt, P., Herremans, D., Van Meirvenne,

M., Laloo, P., and De Clercq, W. (2014). On introduc-

ing an image-based 3d reconstruction method in ar-

chaeological excavation practice. Journal of Archae-

ological Science, 41:251–262.

Fu, Q., Sun, Q., Yew, T.-W., and Tiao, W. (2022). Geo-

neus: Geometry-consistent neural implicit surfaces

learning for multi-view reconstruction. arXiv preprint

arXiv:2205.15848.

Furukawa, Y. and Ponce, J. (2009). Accurate, dense, and ro-

bust multiview stereopsis. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 32(8):1362–

1376.

Galliani, S., Lasinger, K., and Schindler, K. (2015). Mas-

sively parallel multiview stereopsis by surface normal

diffusion. In Proceedings of the IEEE International

Conference on Computer Vision, pages 873–881.

Hou, G., Zhao, X., Pan, Z., Yang, H., Tan, L., and Li, J.

(2020). Benchmarking underwater image enhance-

ment and restoration, and beyond. IEEE Access,

8:122078–122091.

Huang, P.-H., Kopf, J., Ahuja, N., Bleyer, M., Lenz, J.,

and Xu, J.-B. (2018). Deepmvs: Learning multi-view

stereopsis. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages

2821–2830.

Huang, S., Wang, K., Liu, H., Chen, J., and Li, Y. (2023).

Contrastive semi-supervised learning for underwater

image restoration via reliable bank. In Proceedings

of the IEEE/CVF Conference on Computer Vision and

Pattern Recognition, pages 18145–18155.

Irschick, D. J., Christiansen, F., Hammerschlag, N., Martin,

J., Madsen, P. T., Wyneken, J., Brooks, A., Gleiss, A.,

Fossette, S., Siler, C., Gamble, T., Fish, F., Siebert,

U., Patel, J., Xu, Z., Kalogerakis, E., Medina, J.,

Mukherji, A., Mandica, M., Zotos, S., Detwiler, J.,

Perot, B., and Lauder, G. (2022). 3d visualization pro-

cesses for recreating and studying organismal form.

iScience, 25(9):104867.

Islam, M. J., Luo, P., and Sattar, J. (2020). Simulta-

neous Enhancement and Super-Resolution of Under-

water Imagery for Improved Visual Perception. In

Robotics: Science and Systems (RSS), Corvalis, Ore-

gon, USA.

Jordt, A., K

oser, K., and Koch, R. (2016). Refractive 3d

reconstruction on underwater images. Methods in

Oceanography, 15-16:90–113. Computer Vision in

Oceanography.

Kaandorp, J. A. (1993). 2d and 3d modelling of ma-

rine sessile organisms. In Crilly, A. J., Earnshaw,

R. A., and Jones, H., editors, Applications of Fractals

and Chaos, pages 41–61, Berlin, Heidelberg. Springer

Berlin Heidelberg.

Kai, Z., Gernot, R., Noah, S., and Vladlen, K. (2020).

Nerf++: Analyzing and improving neural radiance

ﬁelds. arXiv preprint arXiv:2010.07492.

Kazhdan, M., Bolitho, M., and Hoppe, H. (2006a). Poisson

surface reconstruction. In Proceedings of the Fourth

Eurographics Symposium on Geometry Processing,

pages 61–70.

Kazhdan, M., Bolitho, M., and Hoppe, H. (2006b). Poisson

surface reconstruction. In Proceedings of the Fourth

Eurographics Symposium on Geometry Processing,

SGP ’06, page 61–70, Goslar, DEU. Eurographics As-

sociation.

Kazhdan, M. and Hoppe, H. (2013). Screened poisson sur-

face reconstruction. ACM Transactions on Graphics

(ToG), 32(3):1–13.

Ke, J., Wang, Q., Wang, Y., Milanfar, P., and Yang, F.

(2021). Musiq: Multi-scale image quality transformer.

In Proceedings of the IEEE/CVF International Con-

ference on Computer Vision, pages 5148–5157.

Kerbl, B., Kopanas, G., Leimk

uhler, T., and Drettakis, G.

(2023). 3d gaussian splatting for real-time radiance

ﬁeld rendering. ACM Trans. Graph., 42(4):139–1.

Kopanas, G., Philip, J., Leimk

uhler, T., and Drettakis, G.

(2021). Point-based neural rendering with per-view

optimization. In Computer Graphics Forum, vol-

ume 40, pages 29–43. Wiley Online Library.

Kutulakos, K. N. and Seitz, S. M. (2000). A theory of shape

by space carving. International Journal of Computer

Vision, 38(3):199–218.

Lassner, C. and Zollhofer, M. (2021). Pulsar: Efﬁcient

sphere-based neural rendering. In Proceedings of the

IEEE/CVF Conference on Computer Vision and Pat-

tern Recognition, pages 1440–1449.

Laurentini, A. (1994). The visual hull concept for

silhouette-based image understanding. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence,

16(2):150–162.

Levy, D., Peleg, A., Pearl, N., Rosenbaum, D., Akkaynak,

D., Korman, S., and Treibitz, T. (2023). Seathru-nerf:

Neural radiance ﬁelds in scattering media. In Proceed-

ings of the IEEE/CVF Conference on Computer Vision

and Pattern Recognition, pages 56–65.

Li, C., Guo, C., Ren, W., Cong, R., Hou, J., Kwong, S., and

Tao, D. (2019). An underwater image enhancement

benchmark dataset and beyond. IEEE Transactions

on Image Processing, 29:4376–4389.

VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications

772

Li, J., Skinner, K. A., Eustice, R. M., and Johnson-

Roberson, M. (2017). Watergan: Unsupervised gener-

ative network to enable real-time color correction of

monocular underwater images. IEEE Robotics and

Automation letters, 3(1):387–394.

Li, Z., M

uller, T., Evans, A., Taylor, R. H., Unberath, M.,

Liu, M.-Y., and Lin, C.-H. (2023). Neuralangelo:

High-ﬁdelity neural surface reconstruction. In IEEE

Conference on Computer Vision and Pattern Recogni-

tion (CVPR).

Lior, Y., Yoni, K., Dror, M., Matan, A., Meirav, R., and

Yaron, L. (2020). Multiview neural surface recon-

struction by disentangling geometry and appearance.

Advances in Neural Information Processing Systems,

33:2492–2503.

Lior, Y., Yoni, K., Dror, M., Matan, A., Meirav, R., and

Yaron, L. (2021). Volume rendering of neural implicit

surfaces. Advances in Neural Information Processing

Systems, 34:4805–4815.

Lorensen, W. E. and Cline, H. E. (1987). Marching cubes:

A high resolution 3d surface construction algorithm.

SIGGRAPH Comput. Graph., 21(4):163–169.

Memet, J.-B. (2008). Conservation of underwater cultural

heritage: characteristics and new technologies. Mu-

seum International, 60(4):42–49.

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,

Ramamoorthi, R., and Ng, R. (2020). Nerf: Repre-

senting scenes as neural radiance ﬁelds for view syn-

thesis. In ECCV.

unster, S., Apollonio, F. I., Bl

umel, I., Fallavollita, F.,

Foschi, R., Grellert, M., Ioannides, M., Jahn, P. H.,

Kurdiovsky, R., Kuroczy

nski, P., Lutteroth, J.-E.,

Messemer, H., and Schelbert, G. (2024). Handbook

of digital 3d reconstruction of historical architecture.

page 204.

Niemeyer, M., Mescheder, L., Oechsle, M., and Geiger, A.

(2020). Differentiable volumetric rendering: Learning

implicit 3d representations without 3d supervision. In

Proceedings of the IEEE/CVF Conference on Com-

puter Vision and Pattern Recognition (CVPR), pages

3504–3515.

Oechsle, M., Peng, S., and Geiger, A. (2021). Unisurf:

Unifying neural implicit surfaces and radiance ﬁelds

for multi-view reconstruction. In Proceedings of the

IEEE/CVF International Conference on Computer Vi-

sion, pages 5589–5598.

Panetta, K., Gao, C., and Agaian, S. (2016). Human-visual-

system-inspired underwater image quality measures.

IEEE Journal of Oceanic Engineering, 41(3):541–

551.

Perez-Alvaro, E. (2023). Underwater cultural heritage and

the sustainable development goals. Blue Papers, 2(2).

Schoenberger, J. L., Zheng, E., Frahm, J.-M., and Pollefeys,

M. (2016). Pixelwise view selection for unstructured

multi-view stereo. In Proceedings of the European

Conference on Computer Vision, pages 501–518.

Sch

onberger, J. L. and Frahm, J.-M. (2016). Structure-

from-motion revisited. In Conference on Computer

Vision and Pattern Recognition (CVPR).

Seitz, S. M. and Dyer, C. R. (1999). Photorealistic scene re-

construction by voxel coloring. International Journal

of Computer Vision, 35(2):151–173.

Sun, C., Sun, M., and Chen, H.-T. (2022). Direct voxel

grid optimization: Super-fast convergence for radi-

ance ﬁelds reconstruction. In Proceedings of the

IEEE/CVF conference on computer vision and pattern

recognition, pages 5459–5469.

Szeliski, R. (1993). Rapid octree construction from image

sequences. CVGIP: Image Understanding, 58(1):23–

32.

Tola, E., Strecha, C., and Fua, P. (2012). Efﬁcient large-

scale multi-view stereo for ultra high-resolution image

sets. Machine Vision and Applications, 23(5):903–

920.

Turkulainen, M., Ren, X., Melekhov, I., Seiskari, O., Rahtu,

E., and Kannala, J. (2024). Dn-splatter: Depth and

normal priors for gaussian splatting and meshing.

Wang, Y., Skorokhodov, I., Theobalt, P., and Wonka, P.

(2021). Hf-neus: Improved surface reconstruction us-

ing high-frequency details. Advances in Neural Infor-

mation Processing Systems, 34:19220–19230.

Weidner, N., Rahman, S., Li, A. Q., and Rekleitis, I. (2017).

Underwater cave mapping using stereo vision. In 2017

IEEE International Conference on Robotics and Au-

tomation (ICRA), pages 5709–5715. IEEE.

Xu, C., Kerr, J., and Kanazawa, A. (2024). Splatfacto-

w: A nerfstudio implementation of gaussian splatting

for unconstrained photo collections. arXiv preprint

arXiv:2407.12306.

Yang, M. and Sowmya, A. (2015). An underwater color

image quality evaluation metric. IEEE Transactions

on Image Processing, 24(12):6062–6071.

Yao, Y., Luo, Z., Li, S., Fang, T., and Quan, L. (2018).

Mvsnet: Depth inference for unstructured multi-view

stereo. In Proceedings of the European Conference on

Computer Vision (ECCV), pages 767–783.

Yue, Z., Peng, S., Niemeyer, M., Sattler, T., and Geiger,

A. (2022). Exploring monocular geometric cues for

neural implicit surface reconstruction. arXiv preprint

arXiv:2206.00665.

Zhang, C., Zhou, H., Christiansen, F., Hao, Y., Wang, K.,

Kou, Z., Chen, R., Min, J., Davis, R., and Wang, D.

(2023). Marine mammal morphometrics: 3d model-

ing and estimation validation. Frontiers in Marine Sci-

ence, 10.

Zhang, J., Shao, Y., Li, T., Fang, D. M., Tsian, Y., and

Quan, L. (2022). Critical regularizations for neural

surface reconstruction in the wild. In Proceedings of

the IEEE/CVF Conference on Computer Vision and

Pattern Recognition, pages 6270–6279.

Zhang, T. and Johnson-Roberson, M. (2023). Beyond nerf

underwater: Learning neural reﬂectance ﬁelds for true

color correction of marine imagery. IEEE Robotics

and Automation Letters, 8(10):6467–6474.

Benchmarking Neural Rendering Approaches for 3D Reconstruction of Underwater Environments

773