
Table 2: Quantitative comparison of our method with baseline models across multiple metrics. ↑ indicates that higher values
are better, while ↓ indicates that lower values are better.
Method PSNR ↑ SSIM ↑ LPIPS ↓ CD ↓ FS ↑ FID ↓ CLIP-Similarity ↑
TripoSR 23.681 0.872 0.204 0.246 0.879 25.459 0.812
DreamGaussian 19.204 0.789 0.277 0.382 0.635 57.257 0.815
CRM 22.790 0.891 0.137 0.201 0.802 23.846 0.880
One-2-3-45 18.558 0.726 0.296 0.421 0.633 98.261 0.720
Ours 23.297 0.901 0.124 0.179 0.892 22.547 0.930
Table 3: Comparison of our model with only the feature
aggregator trained against OpenLRM using CD and LPIPS
metrics.
Method CD ↓ LPIPS ↓
OpenLRM 0.271 0.209
Our Model (Only Feature Aggregator Trained) 0.194 0.153
distinctness in occluded regions and reducing the de-
pendency on the quality of multi-view images gener-
ated by Zero123++ are crucial next steps. Enhancing
the initial stages of multi-view image generation or
developing alternative strategies could further boost
the performance and reliability of the 3D reconstruc-
tion process. These advancements will pave the way
for even more robust and precise 3D reconstruction
capabilities in future work.
REFERENCES
Adobe (2023). What is 3d modelling & what is it used for?
Chan, E. R. et al. (2022). Efficient triplane-nerf for 3d ob-
ject reconstruction. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recogni-
tion (CVPR).
Cutting Edge R (2023). 10 exciting applications of 3d mod-
eling in various industries.
Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel,
O., VanderBilt, E., Schmidt, L., Ehsani, K., Kemb-
havi, A., and Farhadi, A. (2023). Objaverse: A uni-
verse of annotated 3d objects. pages 13142–13153.
Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D.,
Liu, F., Sunkavalli, K., Bui, T., and Tan, H. (2023).
Lrm: Large reconstruction model for single image to
3d. arXiv preprint arXiv:2311.04400.
Liu, J., Zhang, Z., Wang, X., Li, S., Zhang, Z. Y.,
Yang, M.-Y., Kautz, J., Hilliges, O., and Tulsiani, S.
(2023a). Neuralangelo: High-fidelity neural surface
reconstruction. arXiv preprint arXiv:2306.03092.
Liu, M., Xu, C., Jin, H., Chen, L., Varma, M. T., Xu, Z., and
Su, H. (2023b). One-2-3-45: Any single image to 3d
mesh in 45 seconds without per-shape optimization.
arXiv preprint arXiv:2306.16928.
Liu, Y., Lin, C., Zeng, Z., Long, X., Liu, L., Komura, T.,
and Wang, W. (2023c). Syncdreamer: Generating
multiview-consistent images from a single-view im-
age. arXiv preprint arXiv:2309.03453.
Long, X., Liu, Y., Lin, C., Zeng, Z., Liu, L., Komura,
T., and Wang, W. (2023). Wonder3d: Single image
to 3d using cross-domain diffusion. arXiv preprint
arXiv:2310.15008.
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,
Ramamoorthi, R., and Ng, R. (2020). Nerf: Repre-
senting scenes as neural radiance fields for view syn-
thesis. In Proceedings of the European Conference on
Computer Vision, pages 405–421.
Poole, B., Jain, A., Barron, J. T., and Mildenhall, B. (2022).
Dreamfusion: Text-to-3d using 2d diffusion. arXiv
preprint arXiv:2209.14988.
Shi, R., Chen, H., Zhang, Z., Liu, M., Xu, C., Wei, X.,
Chen, L., Zeng, C., and Su, H. (2023). Zero123++: A
single image to consistent multi-view diffusion base
model. arXiv preprint arXiv:2310.15110.
Tochilkin, A. et al. (2023). Triposr: High-efficiency 3d re-
construction from minimal data inputs. arXiv preprint
arXiv:2308.12045.
Touvron, H., Bojanowski, P., Caron, M., Misra, I., Mairal,
J., and Joulin, A. (2023). Dinov2: Learning ro-
bust visual features without labels. arXiv preprint
arXiv:2304.07193.
Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., and
Zhu, J. (2023). Prolificdreamer: High-fidelity and di-
verse text-to-3d generation with variational score dis-
tillation. In Advances in Neural Information Process-
ing Systems (NeurIPS).
Wang, Z., Wang, Y., Chen, Y., Xiang, C., Chen, S., Yu, D.,
Li, C., Su, H., and Zhu, J. (2024). Crm: Single image
to 3d textured mesh with convolutional reconstruction
model. arXiv preprint arXiv:2403.05034.
Whizzy Studios (2023). Applications of 3d modeling.
ICAART 2025 - 17th International Conference on Agents and Artificial Intelligence
1326