designed comprises of five stages, which are the Data
Refinement Stage (DRS), Geometric Matching Stage
(GMS), Depth Estimation and Refinement Stage
(DERS), Try-On Fusion Stage (TFS), and 3D Point
Cloud Modelling Stage (3D-PCMS), each carries
distinct yet significant role. In the architecture, in the
DRS, the network first takes in the 2D RGB garment
and single-person images as inputs and then refines
them into several representations. The GMS performs
the affine and TPS transformations for alignment and
geometric characteristics transfer purposes. The
DERS estimates the human body depth and refines it,
followed by the TFS for synthesis action to generate
the 2D warped garment human body image. Lastly,
the 3D-PCMS models and computes the 3D point
cloud of the 3D warped garment human body for
finalising it. For assessing the proposed network,
SSIM and FID were computed by testing the network
on several test sets and the results tabulated show
satisfying results and performance.
REFERENCES
Bookstein, F. L. (1989). Principal warps: Thin-plate splines
and the decomposition of deformations. IEEE
Transactions on pattern analysis and machine
intelligence, 11(6), 567-585.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam,
H. (2018). Encoder-decoder with atrous separable
convolution for semantic image segmentation.
Proceedings of the European conference on computer
vision (ECCV),
Chevalier, S. (2022). Global retail e-commerce sales 2014-
2025. Statista, Key Figures of E-Commerce.
Devnath, A. (2020). European retailers scrap $1.5 billion of
Bangladesh orders. The Guardian, 23.
Du, C., Yu, F., Chen, Y., Jiang, M., Wei, X., Peng, T., &
Hu, X. (2021). VTON-HF: High Fidelity Virtual Try-
on Network via Semantic Adaptation. 2021 IEEE 33rd
International Conference on Tools with Artificial
Intelligence (ICTAI),
Han, X., Wu, Z., Wu, Z., Yu, R., & Davis, L. S. (2018).
Viton: An image-based virtual try-on network.
Proceedings of the IEEE conference on computer vision
and pattern recognition,
Kazhdan, M., & Hoppe, H. (2013). Screened poisson
surface reconstruction. ACM Transactions on Graphics
(ToG), 32(3), 1-13.
Kubo, S., Iwasawa, Y., Suzuki, M., & Matsuo, Y. (2019).
Uvton: Uv mapping to consider the 3d structure of a
human in image-based virtual try-on network.
Proceedings of the IEEE/CVF International Conference
on Computer Vision Workshops,
Liang, X., Gong, K., Shen, X., & Lin, L. (2018). Look into
person: Joint body parsing & pose estimation network
and a new benchmark. IEEE transactions on pattern
analysis and machine intelligence, 41(4), 871-885.
Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., Lin,
L., & Yan, S. (2015). Deep human parsing with active
template regression. IEEE transactions on pattern
analysis and machine intelligence, 37(12), 2402-2414.
Minar, M. R., & Ahn, H. (2020). Cloth-vton: Clothing
three-dimensional reconstruction for hybrid image-
based virtual try-on. Proceedings of the Asian
conference on computer vision,
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net:
Convolutional networks for biomedical image
segmentation. Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2015: 18th
International Conference, Munich, Germany, October
5-9, 2015, Proceedings, Part III 18,
Saito, S., Simon, T., Saragih, J., & Joo, H. (2020). Pifuhd:
Multi-level pixel-aligned implicit function for high-
resolution 3d human digitization. Proceedings of the
IEEE/CVF Conference on Computer Vision and
Pattern Recognition,
Telea, A. (2004). An image inpainting technique based on
the fast marching method. Journal of graphics tools,
9(1), 23-34.
Tuan, T. T., Minar, M. R., Ahn, H., & Wainwright, J.
(2021). Multiple pose virtual try-on based on 3d
clothing reconstruction. IEEE Access, 9, 114367-
114380.
Zhang, Z., Liu, Q., & Wang, Y. (2018). Road extraction by
deep residual u-net.
IEEE Geoscience and Remote
Sensing Letters, 15(5), 749-753.
Zhao, F., Xie, Z., Kampffmeyer, M., Dong, H., Han, S.,
Zheng, T., Zhang, T., & Liang, X. (2021). M3d-vton: A
monocular-to-3d virtual try-on network. Proceedings of
the IEEE/CVF International Conference on Computer
Vision,