
rendering of complex lighting are of primary impor-
tance, even at the potential cost of some initial geo-
metric inaccuracies that Stage 1 can often rectify.
The optimal choice between the standard and re-
versed pipelines depends on the specific scene prop-
erties and rendering priorities, as demonstrated in our
experiments.
4 EXPERIMENTS
4.1 Implementation Details
Experimental Configurations. To ensure meaning-
ful comparisons, the configurations for each experi-
ment were kept consistent. All experiments were per-
formed on a system equipped with an NVIDIA RTX
2080 Super GPU with 8 GB of VRAM. An overview
of the most important configurations can be found in
Table 1.
Evaluation Metrics. We evaluate the quality of
novel view renderings using three established metrics
commonly employed in the NeRF literature: PSNR,
SSIM, and LPIPS. In addition to these individual met-
rics, we also use a composite score, defined in Equa-
tion (7), with values for w
PSNR
, w
SSIM
, w
LPIPS
, and
PSNR
max
set to 0.20, 0.35, 0.45, and 35 dB, respec-
tively. These weights prioritize perceptually aligned
metrics (LPIPS and SSIM) over purely pixel-based
comparisons (PSNR), reflecting human visual percep-
tion of scene similarity. A threshold on this compos-
ite score is used to determine if an image from Stage
1 needs to be refined in Stage 2. This threshold was
experimentally set to 0.7, based on qualitative assess-
ment of rendered image quality: images with com-
posite scores below 0.7 exhibited noticeable artifacts
and were deemed unsatisfactory.
4.2 Results and Analysis
We evaluate MuSt-NeRF on increasingly complex
scenes. Preliminary experiments validate the ef-
fectiveness of our Stage 2 photometric refinement,
while also highlighting the need for a multi-stage
Table 1: Overview of the fundamental configurations used
in MuSt-NeRF experiments.
Parameter Stage 1 Stage 2
Number of Epochs 200k 400k
Batch size 1024 512
MLP layers 8 Prop, NeRF: 4,8
Neurons per layer 256 Prop, NeRF: 256,512
Image resolution (px) 624x468 624x468
approach. Subsequent experiments on challenging
ScanNet scenes then demonstrate the performance of
the full MuSt-NeRF pipeline, comparing the standard
and reversed configurations.
4.2.1 Preliminary Experiments
These experiments isolate and evaluate the Stage 2
architecture, demonstrating its ability to handle both
unbounded scenes and specular reflections, while also
motivating the need for a multi-stage approach. We
use the following datasets:
Mip-NeRF 360 (Materials, Vasedeck): The Ma-
terials scene features round balls of diverse material
properties under controlled lighting, enabling assess-
ment of specular reflection capture. The Vasedeck
scene is a real-world capture of flowers, primarily
exhibiting diffuse reflections, allowing us to evaluate
performance on real-world data with simpler lighting.
Custom Dataset (Plant on Table, Room): The
Plant on Table scene combines diffuse and specu-
lar reflections with unbounded elements (background
visible through glass). The Room scene (real-world,
smartphone, inside-out) provides a more challenging
test with complex geometry and lighting.
Table 2 presents the quantitative results of these
experiments, comparing MuSt-NeRF Stage 2 with
Mip-NeRF 360.
The preliminary experiments evaluate MuSt-
NeRF Stage 2 on increasingly complex outside-in
scenes. Beginning with the synthetic Materials scene,
we observe that MuSt-NeRF Stage 2 accurately ren-
ders specular highlights, achieving an average com-
posite score of over 0.9 (Table 2, Figure 3). Through
the Vasedeck scene, we see that MuSt-NeRF Stage 2
is able to handle real-world scenes, performing com-
parably to the Mip-NeRF 360 implementation. The
subsequent experiment on the Plant on Table scene
further confirms MuSt-NeRF Stage 2’s ability to han-
dle unbounded elements as well as specular and dif-
fuse reflections. It is important to note here that each
of the test images scored higher than the threshold of
0.7 in these three experiments.
Based on these results, we evaluate Stage 2 on
the Room scene, which is an inside-out scenario. We
observe that here too, MuSt-NeRF Stage 2 performs
better than Mip-NeRF 360 on average (Table 2), and
is able to capture reflections effectively (Figure 4).
However, we also observe some geometric inaccura-
cies, especially in regions with high depth variation
and with limited overlapping viewpoints between im-
ages. It is in these regions that the composite score
of MuSt-NeRF is lower than the threshold (Table 3).
These limitations, arising from Stage 2’s purely pho-
tometric nature, emphasize the need for a geometri-
VISAPP 2025 - 20th International Conference on Computer Vision Theory and Applications
568