
well as a realistic shadow, indicating that the method
were able to handle relatively detailed objects. Fig-
ure 5 shows a comparison between our method and
ARShadowGAN. The figure indicates our method is
capable to generate shadows with non-planar back-
grounds. However, we can also see points to be im-
proved. In the output images of our method, there
were issues such as the failure to generate shadows
for the convex parts of objects, excessive noise when
projecting shadows onto walls, and shadow overlap-
ping on concave parts of objects, which tended to
cause noise. Specifically, when projecting shadows
onto walls, noise was more prevalent for areas with
lower brightness. This is likely because the color of
the wall and the projected shadow were similar, lead-
ing to poor learning in the Perception Loss. Addition-
ally, the issue of noise in shadows within the object is
thought to be caused by the absence of the virtual ob-
ject’s image in the depth map. Since the mask image
of the virtual object was provided, the mask region in
the depth map became unnecessary information, pos-
sibly leading to poor learning in those regions.
For the qualitative assessment of the ablation
study, when comparing the images in Figure 7, it was
confirmed that without the perceptual loss, the shad-
ows around the object’s outline were not generated
compared to the results of our method, resulting in
unnatural images. Without the GAN loss, the shad-
ows projected onto the wall were shallower in angle
and smaller in size. For the quantitative assessment
of the results in Table 2 and Figure 8, the data did
not follow a normal distribution. It was confirmed by
the Shapiro-Wilk test. When performing the Mann-
Whitney U test between Ours and the -per loss condi-
tion, a significant difference was found in PSNR and
SSIM at p = 0.05. This suggests that the perceptual
loss term contributes to noise reduction and structural
similarity in the images. Furthermore, a significant
difference was found only in PSNR at p = 0.05 be-
tween Ours and -gan loss, indicating that the Discrim-
inator loss term likely contributes to noise reduction.
8 LIMITATION
One limitation of this study is that the dataset only
includes vertical walls, so accuracy may decrease de-
pending on the complexity of the backgrounds. Addi-
tionally, since the shape information beyond the out-
line of the virtual objects is not included, generating
shadows for complex virtual objects remains difficult.
To address this challenge, it will be necessary to in-
corporate shape information of virtual objects in the
learning process.
Moreover, the acquisition of depth images from
real-world environments is still imprecise, which
means that the model used in this study may not
achieve sufficient accuracy when applied in real-
world scenarios. As a potential solution for applying
this model in the real world, segmentation and label-
ing of the ground and walls could be used as an alter-
native input in place of depth images.
9 CONCLUSION
In this study, we constructed an MR dataset that in-
cludes depth images and generated shadows for vir-
tual objects on non-planar background geometries.
For the dataset construction, a new method for in-
corporating depth images as input was established.
By utilizing the depth images, this study proved that
it is possible to cast shadows of virtual objects onto
surfaces beyond flat background. The results of this
study suggest that generating shadows in considera-
tion of depth information can be applied for complex
background geometries.
REFERENCES
Chang, A. X., Funkhouser, T., Guibas, L., Hanrahan, P.,
Huang, Q., Li, Z., Savarese, S., Savva, M., Song,
S., Su, H., Xiao, J., Yi, L., and Yu, F. (2015).
Shapenet: An information-rich 3d model repository.
arXiv preprint arXiv:1512.03012.
Chrysanthakopoulou, A. and Moustakas, K. (2024). Real-
time shader-based shadow and occlusion rendering in
ar. In 2024 IEEE Conference on Virtual Reality and
3D User Interfaces Abstracts and Workshops (VRW),
pages 969–970.
Couturier, A. (2023). Scenecity: 3d city generator addon
for blender. https://www.cgchan.com/store/scenecity.
5.5.2024.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-
Fei, L. (2009). Imagenet: A large-scale hierarchical
image database. In 2009 IEEE Conference on Com-
puter Vision and Pattern Recognition, pages 248–255.
He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep resid-
ual learning for image recognition. In 2016 IEEE Con-
ference on Computer Vision and Pattern Recognition
(CVPR), pages 770–778.
Hoffman, H., Hollander, A., Schroder, K., et al. (1998).
Physically touching and tasting virtual objects en-
hances the realism of virtual experiences. Virtual Re-
ality, 3:226–234.
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2017).
Image-to-image translation with conditional adversar-
ial networks. In 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 5967–
5976.
An Assessment of Shadow Generation by GAN with Depth Images on Non-Planar Backgrounds
141