Synthesizing Realistic Shadows with Bumpy Surface and Object Model

Estimated by Shadows in Single Image

Jung-Hsuan Wu

and Suguru Saito

1,2

Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan

Department of Informance Sciences, Ochanomizu University, Tokyo, Japan

Keywords:

Image-based Modeling, Image-based Rendering.

Abstract:

Shadows play an important role in the human perception of 3D geometry in photographs, and properly syn-

thesized shadows produce a realistic rendering result. Shadows, especially hard shadows, deﬁne powerful

constraints between a light source, occluders, and the surfaces onto which shadows are cast. We aim to create

a pseudo 3D scene in which the light produces shadows that are the same as those in the input image by taking

the shadow constraints into account. Our method requires only a few user annotations and an input image,

with the objects and their cast-shadows segmented in advance. The segmentation and the user annotations are

used to create a shadow-free texture and a 3D scene, and users can then interactively edit the image by mov-

ing the objects and changing the lighting. Our system produces realistic shadows in real-time while enabling

interactive editing.

1 INTRODUCTION

Shadow, as one of the most common lighting ef-

fects, is very important in computer graphics, com-

puter vision, image processing, and many other ﬁelds.

Shadow provides appropriate cues for reconstructing

the 3D geometry of a scene, and there has been much

research focusing on estimation with shadow.

In general, shadows can be classiﬁed into cast-

shadows and self-shadows. A cast-shadow is cast

by a light source, and there is always an object be-

tween the light source and the shadow. The shape

of a cast-shadow is related to the shape of the object

that occludes light rays from the light source, so the

cast-shadow also indicates the shape of the object that

casts it. Much research had focused on estimating,

rendering, and removing cast-shadows in images.

Self-shadow, in contrast, is less frequently dis-

cussed than cast-shadow, although it is also impor-

tant in the human perception of 3D geometry. The

difference between a self-shadow and a cast-shadow

is that, with a self-shadow, the object occluding light

rays from a light source is part of the surface on which

the shadow appears, rather than another object. Self-

shadows are often observed on bumpy surfaces such

as grass or folded clothes.

Our objective in this work is to estimate and re-

cover the 3D geometry of scenes in which the same

shadow as that seen in the input image is cast. We

have developed a method to recover the bumpy sur-

face with the self-shadow on it, and we also propose

a method to estimate the 3D model of object with the

shadow it casts. Our method requires only a single

input image and a small number of user annotations,

and can be applied to images containing hard shad-

ows such as photographs of sunny outdoor scenes or

night scenes illuminated by streetlamps.

2 RELATED WORK

Estimation of Shadows

Wu, Tang, Brown, and Shum (2007) modeled the

cast-shadow using a Gaussian mixture model. Their

method removes the cast-shadow from an image and

then synthesizes it onto another image. Bousseau,

Paris, and Durand (2009) estimated the shading and

self-shadows of objects and separated the image into

shading and albedo components. The user can then

change the textures and a realistic result is synthe-

sized. Both methods extract and allow limited ma-

nipulation of the shadows in an image, but the shape

of the shadows is not changeable because the object

that casts the shadow is not taken into account.

Kee, O’Brien, and Farid (2013) proposed a

Wu J. and Saito S..

Synthesizing Realistic Shadows with Bumpy Surface and Object Model Estimated by Shadows in Single Image.

DOI: 10.5220/0005298200350045

In Proceedings of the 10th International Conference on Computer Graphics Theory and Applications (GRAPP-2015), pages 35-45

ISBN: 978-989-758-087-1

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

(a) Input image and user- (b) Segmentation of objects (c) The shadow-free texture, (d) Superpixels established

marked edges between and cast-shadows. T, obtained by PatchMatch. by VCells.

ground and wall.

(e) 3D geometry of ground (f) Shadows cast by 3D (g) Shadows cast by (h) Scene synthesized with

and walls. model of objects before optimized 3D model of recovered 3D models.

optimization. objects.

Figure 1: System overview. Our method requires an input image (a) and segmentation of the objects and the cast-shadows

(b). A shadow-free texture (c) is generated by removing cast-shadows and objects using the PatchMatch algorithm. User

annotations, self-shadows, and superpixels (d) are used to reconstruct the 3D geometry of the ground and walls (e), and the

shape of the objects are recovered with their cast-shadows (f)(g). The ﬁnal result (h) is synthesized with the shadow-free

texture and reconstructed 3D models and the user is able to edit the image by moving the objects and the light source. An

example image synthesized after editing is shown in Figure 6.

method to reconstruct the relationship between

shadow, object, and light source from a single image

with user input. The user speciﬁes the shadow, and

the position of the object that casts the user-speciﬁed

shadow is then accurately computed with the recon-

structed relationship. This method aims to determine

the position of the object in an image from its cast-

shadow. Different from this method, we want to gen-

erate a 3D model for the object with its cast-shadow,

so the user is required to specify the object region in

an image.

Shadow Carving

Savarese, Andreetto, Rushmeier, Bernardini, and Per-

ona (2007) built a device to automatically reconstruct

the geometry of an object by self-shadows. In their

method, an object is placed on a rotatable plate and

a camera surrounded by several lights is aimed at the

object. By alternately turning on different lights and

capturing pictures of the object from different views, a

3D model of the object is created by the self-shadows

observed in the image. However, since this method re-

quires images from multiple views and lighting from

different directions, it is difﬁcult to apply it to an or-

dinary single photograph. In contrast, our method es-

timates the shape of an object with a single input im-

age and a small number of user annotations that can

be speciﬁed in just minutes.

User-assisted Image-based Modeling

Horry, Anjyo, and Arai (1997) proposed an efﬁcient

method to create a simpliﬁed 3D scene with one in-

put image and a small amount of user input. Al-

though this method allows rendering from different

viewpoints in a scene, the scene itself is not editable.

Karsch, Hedau, Forsyth, and Hoiem (2011) presented

a method to estimate the scene and illumination con-

dition in an image, and a realistic result of inserting

synthetic objects into the image is synthesized with

the estimated 3D model and illumination condition.

However, this method also does not support editing

of the scene or the lighting.

The method proposed by Zheng, Chen, Cheng,

Zhou, Hu, and Mitra (2012) allows users to interac-

tively edit the scene. This method effectively extracts

the object by minimizing the error, which is computed

according to the user annotations, and provides an in-

terface for the user to modify the objects. However,

their assumption that the objects are combinations of

cuboids means their method can not be applied to ob-

jects with curved surfaces such as cars or trash cans.

Our method, in contrast, assumes that objects have a

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

smooth surface but still allows sharp edges, so it can

generate a 3D model for objects of different shapes.

Kholgade, Simon, Efros, and Sheikh (2014) pro-

posed a method to manipulate the 3D object in a sin-

gle image. This method supports full 3D operations,

including translation, rotation, and scaling, but it as-

sumes that the 3D model and the texture of the object

in the image is known. This assumption limits the ap-

plication of their method to objects whose 3D model

is available. With our method, we aim to create a 3D

model with constraints provided by the user and do

not require the original 3D model of the object.

3 ALGORITHM

The appearance of the scene in an image I is deter-

mined by its geometry X, texture T, and light source

I = v(L, X) lam(L,X) T (1)

where v(·) denotes the visibility of light source L at

scene X and lam(·) is the Lambertian cosine function.

Since I is the only known variable in Equation 1, and

because estimating X, T, and L from I is highly ill-

posed, we ask the user to specify the light source L

and to label the background, the object, and its cast-

shadow pixels, as shown in Figure 1(b), in which pix-

els labeled as the object are cyan, shadow-labeled pix-

els are blue, and the background is white. One object

region must have one shadow region. This user input

provides powerful cues to solve X and T. The texture

of the scene, T, is obtained by inpainting the user-

labeled object regions and shadow regions with the

PatchMatch algorithm proposed by Barnes, Shecht-

man, Finkelstein, and Goldman (2009). Figure 1(c)

shows the scene texture T, in which the objects and

their cast-shadows are removed. Typical shadow re-

moval methods, e.g., Wu et al. (2007), can be applied

to generate T, too, but these may require a greater

number of user annotations.

We separate the 3D geometry of the scene into

the ground and walls, denoted as X

, and several 3D

models for the objects, denoted as X

. The 3D model

corresponds to the i-th object region, O

, labeled

by the user. Different approaches are used to create

and X

For X

, we assume that the walls are perpendic-

ular to the horizontal ground and that the camera is

at the origin and has no yaw or roll rotation. We

ﬁrst generate a rough 3D geometry of the ground and

walls, X

, using the approach proposed by Iizuka,

Kanamori, Mitani, and Fukui (2011). This method

automatically generates 3D geometry for the ground

and walls as well as camera parameters with edges

Figure 2: The bumpy surface derived by using superpixels

(left) and bilateral ﬁlter (right).

between the ground and walls marked by the user, as

illustrated in Figure 1(a). The rough 3D geometry X

is reﬁned to X

with self-shadows using the method

presented in Section 3.1.

For X

, our method processes one object at a time.

We ﬁrst generate a rough 3D model for the current

object and then reﬁne the 3D model by the relation-

ships between the light source, the object, and its cast-

shadow using the method presented in Section 3.2.

After deriving X

and all X

, the ﬁnal 3D geome-

try X is constructed by inserting all X

into X

. The

user is allowed to modify X and L by moving the

foreground objects and the light source, and the re-

sulting image is then synthesized by Equation 1. In

our implementation, the shadow mapping algorithm

proposed by Williams (1978) is used to generate the

visibility term v(·) in Equation 1.

3.1 Bumpy Surface Recovery

In general, self-shadows appear at the concaves of a

bumpy surface, while convexes are usually brighter

than the other parts of the surface. Khan, Reinhard,

Fleming, and B

ulthoff (2006) proposed a method to

derive bump heights for the surface by intensity. In

that method, bilateral ﬁlter is applied to obtain the

albedo component and the intensity of albedo com-

ponent is then directly used as the bump heights of

the surface. However, if a surface has multiple colors,

this approach tends to separate the surface into mul-

tiple pieces with different altitudes according to the

color (see Figure 14).

We overcame this problem by considering local

texture, which is obtained by superpixels. A super-

pixel is a set of pixels containing pixels of similar

color, and thus we assume that all pixels in a super-

pixel have the same albedo and variation in intensity

mostly caused by lighting. We chose not to use bi-

lateral ﬁlter to recover the albedo because it might

slightly blur the edges between different colors, and

produce an uneven surface like the one in Figure 2

(right). The advantage of using superpixels is that

they do not cross the edge between different textures,

SynthesizingRealisticShadowswithBumpySurfaceandObjectModelEstimatedbyShadowsinSingleImage

so ﬂat surfaces can be derived even on the edge be-

tween different colors, as shown in Figure 2 (left).

Our method ﬁrst segments the shadow-free image

T, which is derived by the PatchMatch algorithm, into

superpixels by using the VCell method proposed by

Wang and Wang (2012). The mean color of the pixels

in each superpixel is then used as the albedo of the

pixels in that superpixel. The concaves and convexes

of a bumpy surface often directly relate to the illumi-

nation distribution on the surface, so the bump heights

are derived according to the difference between the

intensity of each pixel and its albedo. Denoting the

bump height as b and the albedo by ρ, we compute

the bump height by

b(x,y) = σ



int



T(x,y)



− int



ρ(x,y)





(2)

where int(·) returns the intensity of the given color

and σ is the constant control, adjustable by the user,

of the bumpiness of the surface. We set σ to 0.01 in

all our experiments.

We derive the normal map, N, of the rough scene

geometry X

by a cross-product of tangent vectors,

which we denote as U and V.

N(x,y) = U(x,y) × V(x, y) (3)

These U and V tangent vectors are obtained by com-

puting the gradient of X

along the x- and y-axes of

the image.

U(x,y) = X

(x,y) − X

(x − 1, y)

V(x,y) = X

(x,y) − X

(x,y − 1)

The 3D geometry of the ground and walls is then

reﬁned with the bump height, b, and the normal map,

(x,y) = X

(x,y) + b(x,y)N(x,y) (4)

After deriving X

, the normal map N is computed

again with Equation 3 but using the reﬁned 3D geom-

etry X

rather than X

3.2 Object Model Creation

Our method creates a 3D model X

by generating ver-

tices that correspond to the pixels inside the i-th object

region O

labeled by the user, i.e., ∀ pixel p ∈ O

, p

corresponds to one x

, where x

is a vertex of X

, and

the inﬂation method introduced by Johnston (2002) is

applied to determine the initial position of each vertex

. Denoting the 3D model created by the inﬂation

method as X

, it casts a shadow S

, which is cal-

culated by a common shadow mapping with the light

speciﬁed by the user. The pixels that are labeled as

’object’ are excluded from the cast-shadow since we

do not know what the shadow looks like behind the

(a) (b) (c)

(d) k = 1. (e) k = 2. (f) k = 3.

(g) k = 10. (h) k = 20. (i) k = 40.

Figure 3: (a) Input image. (b) O

(cyan), the i-th object re-

gion labeled by user, S

(red), the shadow cast by initial

3D model X

, and S

(blue), the shadow region labeled

by user. (c) The symmetric difference (yellow) of S

and

, i.e., S

≡ (S

)−(S

). (d) to (i) The

shadow S

(red), which is cast by the 3D model X

ob-

tained after k iterations, becomes more consistent with the

user-speciﬁed shadow region S

when the number of itera-

tions, k, grows.

object. Figure 3(b) shows the synthesized shadow re-

gion S

(red) and the shadow region labeled by the

user (blue), denoted as S

. The symmetric differ-

ence of S

and S

, i.e., S

4 S

≡ (S

) −

) (yellow region in Figure 3(c)), indicates

the difference between X

and the real 3D model of

the object. Note that 4 denotes the symmetric differ-

ence operator.

On the basis of this observation, the initial 3D

model X

is reﬁned with three constraints. First, the

projection of the 3D model on the image plane ﬁts the

region of the object labeled by the user (the cyan re-

gion in Figure 3(b)). Second, the 3D model casts a

shadow that has the same shape as the shadow region

labeled by the user (the blue region in Figure 3(b)).

Third, it is assumed that the object does not ﬂoat in

the air, i.e., that it is located either on the ground or

on the wall.

Our objective function is designed in accordance

with these constraints. It consists of a modeling term

model

, an anchor term E

anchor

, a projection constraint

pro ject

, and a shadow constraint E

shadow

k+1

= argmin

model

) + E

anchor

)

pro ject

) + E

shadow

)

(5)

The modeling term E

model

controls the overall

shape of the created 3D model by maintaining its lo-

cal smoothness and consistency. Denoting N(·) as the

adjacent neighbor of a given pixel, and X

(p) as the

corresponding vertex in X

to pixel p, E

model

is com-

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

puted by

model

) =

∑

p∈O

∑

∈N(p)

(p) − X

+ w(p, p

)·



(p) − X

)



−



(p) − X

)



(6)

w(·) is the weight that returns a large value if pix-

els p and p

have a similar color and a small value

otherwise. The ﬁrst term in E

model

evaluates the lo-

cal smoothness of the model X

, and the second term

maintains the local consistency between X

and X

comparing their gradients. Note that large differences

are allowed across the edge of different textures with

w(·), which is deﬁned by the inverse of color distance

between p and p

w(p, p

) =

kI(p) − I(p

)k + ε

(7)

where I denotes the input image represented in RGB

color space, and each color channel is from zero to

one. ε is a small value (1/255 in our implementation)

to prevent from division by zero.

The anchor term E

anchor

literally anchors the ob-

ject by indicating where it should be placed in the

scene. The user has to specify the boundaries U

where the object comes into contact with the ground

or walls. E

anchor

is deﬁned as

anchor

) =

∑

p∈U

(p) − X

(p)k

(8)

pro ject

forces the projection of the 3D model

onto the image plane to be as similar to the ob-

ject region O

labeled by user as possible. Denoting

the ray emitted from the camera and passing pixel p

on image plane as v

, the projection consistency is

evaluated with the distance between vertex X

(p) and

pro ject

) =

∑

p∈O

(p) −

(p) · v

(9)

where v

= X

(p) since the camera is assumed to be

located at the origin.

The shadow constraint E

shadow

evaluates the con-

sistency between S

and S

. For a pixel p ∈ S

, our method ﬁnds the nearest pixel p

in S

and

the nearest pixel p

in S

. Pixel p

indicates where

the object’s cast-shadow should appear, so we achieve

the consistency between S

and S

by modifying the

shape of the 3D model X

to move the shadow from

to p

. Denoting the vertex in X

that casts a

shadow on X

) as x

, the ray v

from X

)

to light source L indicates where x

should be lo-

cated. The shadow consistency is kept by minimizing

Figure 4: For a pixel p ∈ S

4 S

, our method ﬁnds the

nearest pixel p

in S

, and the nearest pixel p

in S

Note that if p ∈ S

, p ≡ p

, and if p ∈ S

, p ≡ p

. The

vertex x

∈ X

that casts its shadow on p

is found by

back-tracking from shadow mapping. The ray v

from pixel

to light source L indicates where the vertex x

should

be located. E

shadow

preserves the shadow consistency by

minimizing the distance between x

and the ray v

the distance between vertex x

and the ray v

shadow

) =

∑

p∈S

−

· v

(10)

where v

= L − X

), and x

is the vector from

) to x

, i.e., x

= x

− X

Figure 4 visualizes the relationship between p,

, p

, x

, and v

used in Equation10. The ver-

tex x

is found by back-tracking from shadow map-

ping (Williams (1978)). If there are multiple vertices

casting a shadow on the same pixel, only the vertex

that is closest to the light source is reserved for back-

tracking.

Since Equation 5 is a quadratic function of X

it can be solved by deriving the ﬁrst-order derivative

with respect to X

and then applying a linear solver

(Sorkine and Alexa, 2007). The 3D model X

is re-

ﬁned iteratively until the difference of shadow, which

is evaluated by |S

4 S

|, is less than a threshold.

In our implementation, we selected 0.05× |S

| as the

threshold. Figure 3(d)-(i) shows the results after dif-

ferent numbers of iterations.

4 RESULTS

We implemented our algorithm using Java on an In-

tel i7 16GB RAM machine with a GeForce GTX 660

graphics card. We used standard LUD implementa-

tion provided by Abeles (2013) to solve Equation 5.

It took about 20 seconds to recover the bumpy surface

with a 700k-pixel image, and about 10-15 minutes to

estimate the shape of an object of about 4k pixels.

Our method can be applied to a variety of objects.

We generated synthetic images with the 3D scene es-

timated by our method from the input images shown

in Figure 5. Figure 6 shows the relighting results by

SynthesizingRealisticShadowswithBumpySurfaceandObjectModelEstimatedbyShadowsinSingleImage

(a) (b) themactep (2005) (c) (d)

(e) (f) Sidorenko (2010) (g) Matthews (2010)

Figure 5: The images used in our experiments. (a), (c), (d), and (e) were taken by the authors. (b), (f) and (g) are from

Flickr and fall under the Attribution-NonCommercial-ShareAlike 2.0 Generic License released by Creative Commons (CC

BY-NC-SA 2.0).

(a) Relighting by moving the (b) Relighting by moving the

light. light and the cars.

of (a). of (b).

Figure 6: The relighted results synthesized with 3D models

of cars and scene estimated from input image in Figure 5(b).

moving the light and cars. Obviously, the shadow of

the streetlamp in Figure 6 changes according to the

3D geometry.

Our system also allows the user to copy the es-

timated 3D models. Figure 7 shows the result af-

ter removing the trafﬁc cone on the right in Figure

5(e), where the one on the left is duplicated multiple

Figure 7: Multiple trafﬁc cones are synthesized at different

locations and cast shadows.

times and the duplicates are placed at different loca-

tions. We can see that the trafﬁc cones cast visually

plausible shadows. In Figure 8, the multiple Dorae-

mon dolls from Figure 5(c) are put into the scene in

Figure 5(a), and each doll casts a shadow onto the

one on its right side. The 3D geometry of the dolls

can be perceived from the shadows on their feet and

faces. Figure 9 shows the result after moving the boy

from Figure 5(f) forward and inserting the Doraemon

doll from Figure 5(c) and the yellow doll from Fig-

ure 5(g) into the image. We can see that the boy casts

a shadow on the yellow doll and that the yellow doll

casts a shadow on the Doraemon doll. Note that the

shadow on the Doraemon doll’s face is cast by the

yellow doll’s hand.

Because it is extremely difﬁcult to accurately es-

timate the back-face of an object, which is obviously

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

(a) Side-view of 3D model of Doraemon (b) Doraemon doll and its shadow cast by (c) The light in (b) is rotated clockwise.

doll estimated by our method. light coming from the right.

(d) The light in (c) is rotated clockwise (e) The light in (d) is rotated to the front of (f) The trash can casts a shadow on the doll,

again and the shadow seems more plausible. the doll and the shadow seems realistic. with which we can perceive the 3D

geometry of the doll.

Figure 10: The 3D model of the Doraemon doll recovered by our method (a) and the shadows cast by the light source placed

at different locations (b)-(e). The trash can casts realistic shadow on the doll in (f).

Figure 8: Inserting multiple Doraemon dolls from Figure

5(c) into Figure 5(a).

invisible in a single image, our method recovers only

the 3D geometry of the part of an object that is visi-

ble in the input image. Figure 10(a) shows the side-

view of the 3D model of the Doraemon doll estimated

by our method. Figure 10(b)-(e) shows its shadow

cast by light coming from different directions. Al-

though its cast-shadow seems incomplete when the

light comes from the side (Figure 10(b) and (c)), the

synthesized shadow is ﬁne when the light comes from

the front of the doll (Figure 10(d) and (e)). Moreover,

in Figure 10(f), the shadow on the Doraemon doll cast

by the trash can also seems plausible. The results

of these experiments demonstrate that the 3D model

generated by our method produces realistic shadows

as long as the light is not coming from the side of the

objects.

The 3D models generated by our method were

Figure 9: Top: Inserting doll from Figure 5(g) and Dorae-

mon doll from Figure 5(c) into Figure 5(f). Bottom: zoom-

in to see the shadows cast by the boy, and the dolls.

able to produce a visually plausible result even if

the viewing direction was slightly changed. Figure

11 shows the 3D models and scene estimated by our

method from different viewpoints, from which we can

see that the scene still seems plausible when the roll

SynthesizingRealisticShadowswithBumpySurfaceandObjectModelEstimatedbyShadowsinSingleImage

(a) Default viewpoint.

(b) Roll angle = -5

◦

. (c) Roll angle = 5

◦

(d) Roll angle = -10

◦

. (e) Roll angle = 10

◦

(f) Roll angle = -20

◦

. (g) Roll angle = 20

◦

Figure 11: Viewing the 3D models and scene estimated by our method from different viewpoints. The camera is aimed at the

center of the scene and roll rotation with different angles is then applied.

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

angle is about ten degrees (Figure 11(b)(c)). How-

ever, due to lack of back-face, when the roll angle

becomes twenty degrees, we clearly see that the trunk

of the car in Figure 11(g) does not exist.

(a) The shadow rendered on a (b) The shadow rendered on a

plane. bumpy surface.

in (a). in (b).

Figure 12: The shadow rendered onto a plane (a)(c) and

onto a bumpy surface recovered by our method (b)(d).

Figure 13: The 3D geometry of a brick paver with metal

panel decoration retrieved by approach of Khan et al. (2006)

(left) and our method (right).

In addition to the 3D model of an object, we tested

our bumpy surface recovery method on surfaces made

of different materials. Figure 12 shows the shadow

rendered onto plain (a)(c) and bumpy surfaces recov-

ered by our method (b)(d). Obviously, the widths of

the shadows in Figure 12(c) are almost the same. In

contrast, the shadow in Figure 12(d) seems to be cast

on waves, and its width evidently changes accord-

ing to the concaves and convexes of the surface, from

which the 3D geometry is easily perceived.

We compared our bumpy surface recovery method

with Khan et al. (2006), which uses the intensity of

pixel color as depth. As indicated in Figure 13, we

found that while both methods recover the gaps be-

tween bricks, their method tends to generate blocks

with different altitudes, e.g., the metal panel in Fig-

ure 13 seems to have sunk into the ground. Figure 14

shows this problem more clearly. The shadows in Fig-

ure 14(bottom), in which the scene is reconstructed

by Khan et al. (2006), imply that the carpet is sep-

arated into several blocks with different altitudes, so

that shadows appear at the boundary between blocks

of different colors. In contrast, our method recon-

structs the 3D geometry of the carpet as a plane, so

Figure 14: The 3D geometry of carpet retrieved by our

method (top row) and by the approach of Khan et al. (2006)

(bottom row). We show the scene viewed from a different

direction in the column on the right.

there is no shadow in Figure 14(top).

Our method generates a smooth surface even on

the edge between different colors by considering lo-

cal albedo. However, since local albedo is obtained by

superpixels, the size of the superpixels plays an im-

portant role in the computation of bump height. Fig-

ure 15 shows the 3D geometry of the scene in Fig-

ure 5(d) obtained with superpixels of different sizes.

If the size of the superpixels is too small, most of

the convexes and concaves are not generated (Figure

15(a)) because the superpixels are too small to gather

enough pixels and the correct albedo might not there-

fore be obtained. On the other hand, if the size of

the superpixels is too large, the superpixels can no

longer keep local information. Figure 15(c) and (d)

shows the results obtained with over-sized superpix-

els, where we can clearly see that the bars of the drain

ditch and the bricks protrude into the air. Figure 15(b)

shows the result obtained with superpixels of an ap-

propriate size, where the weeds cast self-shadows and

the bricks do not seem to be protruding. This indicates

that the size of the superpixels must be carefully cho-

sen to generate proper 3D geometry for bumpy sur-

faces.

5 CONCLUSION

We proposed a method to compute the bump heights

for a bumpy surface and a method to create 3D models

for the objects, ground, and walls that requires only

a single input image and a small amount of user an-

notation. The generated 3D models are sufﬁcient for

rendering realistic shadows and the recovered bumpy

surface is useful to synthesize visually plausible im-

ages.

SynthesizingRealisticShadowswithBumpySurfaceandObjectModelEstimatedbyShadowsinSingleImage

(a) 3 × 3 pixels.

(b) 40 × 40 pixels.

(d) 160 × 160 pixels.

Figure 15: The 3D geometry of brick paver and drain ditch

cover recovered by our method with superpixels of different

sizes.

Experimental results show that our method can

recover 3D models for a variety of objects, includ-

ing dolls, humans, and cars, and that our bumpy sur-

face recovery method is more robust than the previous

work. Although our method only estimates the 3D ge-

ometry of the part of a scene that is visible in the input

image, a realistic synthetic image is still visible even

with a little bit of change in the viewing direction.

In future work, we wish to recover the hidden part

of an object by considering the symmetry of the ob-

ject or by different user annotations so that more op-

erations (e.g., object rotation) are allowed.

We also want to improve our current method to

the point where it is applicable to soft shadows. Un-

like hard shadows, a soft shadow does not speciﬁcally

indicate the shape of the 3D model that casts it. The

method proposed by Kee et al. (2013) may help solve

this problem.

REFERENCES

Abeles, P. (2013). Ejml: Efﬁcient java matrix li-

brary. Retrieved July 2, 2013, from https://code.

google.com/p/efﬁcient-java-matrix-library.

Barnes, C., Shechtman, E., Finkelstein, A., and Gold-

man, D. B. (2009). Patchmatch: a randomized cor-

respondence algorithm for structural image editing.

In ACM SIGGRAPH 2009 papers, SIGGRAPH ’09,

pages 24:1–24:11, New York, NY, USA. ACM.

Bousseau, A., Paris, S., and Durand, F. (2009). User-

assisted intrinsic images. ACM Trans. Graph.,

28(5):130:1–130:10.

Horry, Y., Anjyo, K.-I., and Arai, K. (1997). Tour into

the picture: Using a spidery mesh interface to make

animation from a single image. In Proceedings of

the 24th Annual Conference on Computer Graphics

and Interactive Techniques, SIGGRAPH ’97, pages

225–232, New York, NY, USA. ACM Press/Addison-

Wesley Publishing Co.

Iizuka, S., Kanamori, Y., Mitani, J., and Fukui, Y. (2011).

An efﬁcient modeling system for 3d scenes from a

single image. IEEE Computer Graphics and Appli-

cations, 99.

Johnston, S. F. (2002). Lumo: Illumination for cel anima-

tion. In Proceedings of the 2Nd International Sympo-

sium on Non-photorealistic Animation and Rendering,

NPAR ’02, pages 45–ff, New York, NY, USA. ACM.

Karsch, K., Hedau, V., Forsyth, D., and Hoiem, D. (2011).

Rendering synthetic objects into legacy photographs.

In Proceedings of the 2011 SIGGRAPH Asia Confer-

ence, SA ’11, pages 157:1–157:12, New York, NY,

USA. ACM.

Kee, E., O’Brien, J. F., and Farid, H. (2013). Exposing

photo manipulation with inconsistent shadows. ACM

Trans. Graph., 32(3):28:1–28:12.

Khan, E. A., Reinhard, E., Fleming, R. W., and B

ulthoff,

H. H. (2006). Image-based material editing. In ACM

SIGGRAPH 2006 Papers, SIGGRAPH ’06, pages

654–663, New York, NY, USA. ACM.

Kholgade, N., Simon, T., Efros, A., and Sheikh, Y. (2014).

3d object manipulation in a single photograph using

stock 3d models. ACM Trans. Graph., 33(4):127:1–

127:12.

Matthews, L. (2010). long shadows on 21 december 2010 -

day 355. From ﬂickr https://www.ﬂickr.com/photos/

mythoto/5279547375/. Attribution-NonCommercial-

ShareAlike 2.0 Generic license released by Creative

Commons (CC BY-NC-SA 2.0).

Savarese, S., Andreetto, M., Rushmeier, H. E., Bernardini,

F., and Perona, P. (2007). 3d reconstruction by shadow

carving: Theory and practical evaluation. Interna-

tional Journal of Computer Vision, 71(3):305–336.

Sidorenko, O. (2010). Luka, ira and their long shad-

ows. From ﬂickr https://www.ﬂickr.com/photos/

GRAPP2015-InternationalConferenceonComputerGraphicsTheoryandApplications

oksidor/5083256263/. Attribution 2.0 Generic license

released by Creative Commons (CC BY 2.0).

Sorkine, O. and Alexa, M. (2007). As-rigid-as-possible

surface modeling. In Proceedings of EUROGRAPH-

ICS/ACM SIGGRAPH Symposium on Geometry Pro-

cessing, pages 109–116.

themactep (2005). Kaliningrad night lights. From ﬂickr

https://www.ﬂickr.com/photos/themactep/42854712/.

Attribution-NonCommercial-ShareAlike 2.0 Generic

license released by Creative Commons (CC BY-NC-

SA 2.0).

Wang, J. and Wang, X. (2012). Vcells: Simple and efﬁcient

superpixels using edge-weighted centroidal voronoi

tessellations. IEEE Trans. PAMI, 34(6): pp. 1241-

1247.

Williams, L. (1978). Casting curved shadows on curved sur-

faces. In Proceedings of the 5th Annual Conference on

Computer Graphics and Interactive Techniques, SIG-

GRAPH ’78, pages 270–274, New York, NY, USA.

ACM.

Wu, T.-P., Tang, C.-K., Brown, M. S., and Shum, H.-Y.

(2007). Natural shadow matting. ACM Trans. Graph.,

26(2).

Zheng, Y., Chen, X., Cheng, M.-M., Zhou, K., Hu, S.-M.,

and Mitra, N. J. (2012). Interactive images: cuboid

proxies for smart image manipulation. ACM Trans.

Graph., 31(4):99:1–99:11.

SynthesizingRealisticShadowswithBumpySurfaceandObjectModelEstimatedbyShadowsinSingleImage